Hello all,
These problems seem to be the same we noticed for EPYC CPUs
In general you guys need newer kernels and boot with idle=nomwait
Also look there for more infos why MWAIT didn't worked :
epyc 7551 spontaneously resets after 10mins rendering
BR
I have been having these exact same issues since I got my system. Typically I come to the computer in the morning and it'll be locked up and I have to reset it. On occasion though, and when the system is fairly idle, it will freeze in the middle of using it. Mouse pointer will freeze on the screen.. system doesn't respond to pings any longer.. and I have to reset it. This happens every couple days usually but have had it happen 3 times in a day before. I have been through every BIOS version, custom kernels, and swapped almost every piece of hardware, to no avail.
Asus PRIME X370-PRO
AMD Ryzen 1700
Corsair Vengeance LED DDR4 3200Mhz 2 x 8GB (x2 for a total of 32 GB)
MSI Radeon RX 580 Gaming X 8G
-> upgraded from Asus R9 280x in attempt to fix lockups
Corsair HX1000i PSU
-> upgraded from Corsair TX750W in attempt to fix lockups
Samsung 960 PRO NVMe M.2 PCIe x4 SSD 1TB
-> got rid of all spinning disks in attempt to fix lockups
Also swapped keyboard, mouse, and monitors, got rid of USB devices, etc.
I started with Ubuntu 16.04 and then moved to Arch and now back to Ubuntu and am on 18.04 currently.
Did you try the bios setting everyone has been taking about, or disable the
c6 power state?
Today my Ryzen 7 1800X, on Asus Prime X370-Pro, has 67 days uptime, and has been idle for all that time.
So, I have not suffered a lockup since upgrading to BIOS to 4011 (with "AGESA 1.0.0.2a + SMU 43.18"), and setting the "Advanced Mode -> Advanced -> AMD CBS -> Power Supply Idle Control" option to "Typical Current Idle".
I did run the BIOS 4011 for a while and still had crashes. I am currently running 4012 and came to a crashed system this morning.
I just set the "Power Supply Idle Control" option earlier this morning after seeing this thread so I will update in a couple days if it has helped.
Prior to changing that setting....
> last | grep boot
reboot system boot 4.15.0-23-generi Mon Jul 16 08:47 still running
reboot system boot 4.15.0-23-generi Mon Jul 16 05:21 - 08:44 (03:22)
reboot system boot 4.15.0-23-generi Sat Jul 14 12:51 - 08:44 (1+19:52)
reboot system boot 4.15.0-23-generi Sat Jul 14 12:08 - 12:32 (00:23)
reboot system boot 4.15.0-23-generi Fri Jul 13 07:33 - 10:49 (1+03:16)
reboot system boot 4.15.0-23-generi Wed Jul 11 12:32 - 10:49 (2+22:16)
And now...
09:46:13 up 4 days, 59 min, 2 users, load average: 0.32, 0.50, 0.55
It's looking promising.
@jesse_amd jesse_amd, Would you help us folks, who have been suffering from the idle lockup bug, with Ryzen for so so so long ???
We are asking you to comment because, we find that you seem to have helped the customers with EPYC CPUs, to solve a similar problem.
The bugzilla ticket URL is 196683 – Random Soft Lockup on new Ryzen build
With everything new & stock, we always have to go into the BIOS and disable the "Global C state control" setting in the BIOS, to make the system stable.
else, usually, when the system is idle, it would lockup up. Other findings include, but not limited to setting the Power Supply Idle current to "common current idle"
We would be glad to have a BIOS firmware fix, so tha,t with that, the system would remain stable with the default BIOS settings.
Even the recent BIOS updates, for instance, the one that updates the firmware to AGESA 1.0.0.2a + SMU 43.18, has not helped.
Kindly help!
07:13:53 up 11 days, 22:27, 2 users, load average: 0.73, 0.81, 0.77
Still no crashes! I think the "Power Supply Idle Control" may have solved the issue.
"Typical Current Idle" setting seems to have fixed it for me. Only changed this setting in UEFI, everything else default. Uptime 4 days without an incident, running the latest bios (4012) and kernel 4.18.0-rc7-mainline on asus prime x370-pro with 1700. Had to put up with random freezes ever since I built this system. Upgraded ram to faster ones thinking they could be the culprit. Been updating to the latest bios as soon as they came out and running the latest kernel hoping something would fix it. Finally my system will be stable, crossing my fingers.
I have tried pretty much everything. Ubuntu 4.15 LTS Kernel & 4.17 on Fedora. None of the above methods really eliminates the problem for me. I still get random freezes on reboot or after PC being idle overnight. Few months back I changed from Ryzen 5 2400g to Ryzen 5 1600x, hoping the older model without onboard graphics would be more Linux friendly.
Overall pretty put off by AMD now, there shouldn't be all this manual tweaking just for the basics.