(note: Apologies: I see that the same message was posted twice (revised in this post), but I'm unable to delete or even edit the original post for some reason)
I first reported my issue in Aug 16, 2019.
After RMA-ing every component in my system, and after running problem-free after a year and a half after my initial report of the issue, the same/similar issue has started again: random shutdowns and reboots like in the summer of 2019.
I initially believed the culprit was a faulty PSU (which went through RMA, and was deemed faulty by Seasonic), but now I’m not so sure that was even the problem to begin with.
The only difference with 2019 is that now I do not get the WHEA-Logger ID 18 error. The Event viewer only provides the generic “Kernel-Power, event-ID 41, Task 63, Keywords 0x8000400000000002: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.” There are no minidump files, BugcheckCode = 0; there’s really nothing diagnostic to go on.
It can happen when the system is idling, it has happened while gaming, it’s happened while working. It’s completely unpredictable. Since I’m working from home and my PC is my livelihood, I’m exasperated.
The system runs in Ryzen balanced power mode. There’s no overclock; all BIOS settings are optimized defaults, except for RAM timings (G.Skill pre-set profile 1) and fan curves.
I’ve changed out the RAM, disabled XMP, rolled back BIOS, flashed BIOS, tested every available BIOS from Asus, rolled back chipset drivers, etc. I’ve reinstalled Windows 10 a number of times. I put in an older GTX 1050Ti with the same results. I’m now on a fresh install (installed on 2021-01-05) of Win 10 Pro 20H2.
I’ve also replaced/RMA-ed the following:
I also recently switched cooling from the NZXT Kraken X72 (after TWO (2) RMAs!!), to the EK-AIO Elite 360. I contacted EKWB for them to check some of my temps, and they noticed some anomalies:
“I do see some weird voltage to your CPU, but I am not sure how is that possible as it looks like your core voltage is boosting over 1.475 Volts … I think you have the same problem as JayzTwoCents encountered in his review of 3900x where motherboard BIOS/Uefi settings are set way too high as it was the first BIOS version. … Your temperatures are actually ok for the massive Voltage that is applied to it. But I would still over them as it affects the longevity of the CPU. But the CPU is still young and we do not have data on the Ryzen degradation with time, but based on the older CPUs, high Voltage degrades the CPU and will start to boost lower and lower with time. And it affects the stability of the system overall if it is too high.”
I can’t believe that the newest Asus BIOS would still allow for this kind of core voltage? Or is this the CPU literally overpowering the motherboard?
I’m writing this while running in “power saver” mode, with voltages not going over 1.0V. It’s been running stable for 2 days. But since the restarts are so erratic, they might start here too. Plus running ‘power saver’ defeats the purpose of this CPU.
I have already started the process of RMA for the CPU and motherboard.
However, I am really concerned about the situation after receiving the new CPU and motherboard. What if these units will also eventually have the same issues as my current RMA units? What if this happens after the 3-year warranty period? I just have to buy the newer model?
This is my first AMD build, and the least I expect is it to run stable. The CPU/mobo clearly does not function 'out of the box' when installed correctly and with the most up-to-date chipset drivers and BIOS.
It seems to me that this might be a manufacturing issue; would all this not be grounds for a product recall, if it’s so widespread? Though I imagine that will only happen when systems start to catch fire… But still, is this not a consumer advocacy issue?
RMA-ing the same components over and over again is not a solution.
Any input would be appreciated.
update: the system now crashes and reboots after critical Kernel Power event 43 in 'power saver mode'. So I'm staring to doubt it has anything to do with 'massive voltages' applied to the CPU core voltage, since it never goes above 1.0V in power saver.
I'm still waiting for word back from AMD. I'm hoping for an advance replacement, since I cannot afford (literally) any additional downtime.
flash latest bios of your MoBo
have in mind: some MoBos require you to flash 1-3 other bios first before you can flash to latest bios...
ps AMD is right now releasing its new AGESA = there will be new BIOS within 1-6 weeks for many MoBos 😉
I've been updating BIOS religiously, with every new release. I'll check though if a manual flash via the flashback USB port on the mobo makes a difference.
I see Asus posted a beta BIOS with the new agesa, but I think I'll hold off until the final release; I'm not that brave
I hate like hell to run a beta BIOS too, but I mean, random reboots are an emergency situation. AGESA 126.96.36.199 helped a lot for me.
I guess I might as well try - I’ve nothing to lose at this point.
The system just rebooted during a file transfer, corrupting the ssd I was copying files to. I managed to repair and reformat the drive, and the lost data was backed up anyway, but it’s a distressing sign.
Anyhow, is it true that a core voltage boosting over 1.475 Volts is abnormal? It sometimes hits 1.5V max. No overclock, just optimized defaults in BIOS.
Silicon is more efficient at lower temperatures, and precision boost will clock higher based on temperature and power usage.
My clocks are moderate/low: max spikes of 4.2-4.3 GHz, so no massive overclocks at all. It's the voltages that are the issue.