Hi there, we bought our first two Dell Poweredge R7525 server with 2x7313 proc's and we have a pretty serious issue and also read it on a reddit post, whereby you will have issues starting VM's up after a reboot of the host and what basically happens, is that you will start up like 10 VM's, and then normally like the last 5, will get stuck whereby the vmms.exe service bombs out but is still running, but the vmms service cant read the configs of those VM's and you can see that when going into the settings of the VM, whereby you will only see loading, and this issue you can only fix by rebooting the host again or killing the VMMS process manually additional to the HyperV compute service. So it is more that the hyperv Compute service kind of half bombs out, but there is a VMQ event that gets written into the System and the reddit post of other guys struggling with this also believe that it is the fibre channel driver causing issues but I have seen HyperV bomb out without the time that that event gets written. We installed the latest fibre channel drivers, moved VM's to the ethernet controllers and still this happens so its not totally that. Very weird issue, but we have 2022 running on most of our older Intel boxes and dont have this issue. This is a serious issue so between AMD,Dell and Microsoft, there is a bug in the hypervisor and last week it took about 5 restarts to get all of our productions VM's up...Any additional guidance will be greatly appreciated...
Hi, we have similar if not the same issue. Our troublemaker is a Hyper-V cluster with two Supermicro Sertvers with one 7313 each and Windows Server 2022. Hyper-V VMs Stuck in the Starting/Stopping State and if we open VM Properties it shows "loading" only. Only kill and restart of VMMS.exe helps but not for long.
Our Hardware (2X):
Mainboard Supermicro H12SSW-NT Socket SP3 with 2 X 10G on board (Broadcom Chipset)
Prozessor AMD EPYC 7313 Socket SP3
25G Ada Dual SFP28 Mellanox ConnectX-6 Lx MCX631102AN-ADAT PCIe 4.0 x8
So we also have no solution yet but keep working on it.
Hi there, sorry I missed the reply, but yes, the quickest way to get them running that I found other than rebooting the host was to shut the VM's down, kill the vmms and comppute process, start them back up and start 1 by 1 until your luck hits and you can start all of them. I got to a reddit post before which also had the same issues and they went down the driver route for the fibre channel cards which worked for one because there is a networking warning that occurs, but for me, updating the drivers on the fibre channel card didn't work.
Not sure why my replies aren't going through, but sorry for the late reply, was on leave but ja, sal it on other forums as well and they also went down the driver for the fibre card but updating it didn't work for me and I have to kill both the vmms and the compute serrvice and then try and start them up 1 by 1 and hope for the best.
Hi, in the meantime we identified that specific VM's causes the issue!? As soon as we start them up, the trouble starts. Because we can reproduce the issue our IT Service provider opened a Ticket with Microsoft. So far we have not identified why those VMs causes the issue.
Will post here as soon as we have more information.
Hi, Microsoft support identified as the probable cause an incompatibility or bug in the combination of Hyper-V, its block tracking mechanism RCT (Resilient Change Tracking) used by Veeam Backup, and an AMD CPU.
We were told a patch for this is in development between Microsoft and Veeam and should be ready in the next few weeks. We are still waiting for details on this.
Any updates on this case? We are experiencing the exact same symptoms, also on AMD based systems running Server 2022..
We have tried pretty much everything: different NICs, upgraded Veeam to the latest version 12, and other things.. Pretty much given up by now.
Hi Guys, just to let you you´re not alone, I´m on in the same combination AMD + VEEAM + HV not starting VMs or let to change settings GUI or PS. Most of the time when I Reach 32GB I got the problem, I don´t think it´s a specific VM in my case. Any how if some patch appears would be good.
We resolved our problem by upgrading the VMs to the latest configuration version (10) for Server 2022, and setting NUMA settings to “use hardware topology”. It turned out to be a NUMA problem.