(note: Apologies: I see that the same message was posted twice (revised in this post), but I'm unable to delete or even edit the original post for some reason)
I first reported my issue in Aug 16, 2019.
After RMA-ing every component in my system, and after running problem-free after a year and a half after my initial report of the issue, the same/similar issue has started again: random shutdowns and reboots like in the summer of 2019.
I initially believed the culprit was a faulty PSU (which went through RMA, and was deemed faulty by Seasonic), but now I’m not so sure that was even the problem to begin with.
The only difference with 2019 is that now I do not get the WHEA-Logger ID 18 error. The Event viewer only provides the generic “Kernel-Power, event-ID 41, Task 63, Keywords 0x8000400000000002: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.” There are no minidump files, BugcheckCode = 0; there’s really nothing diagnostic to go on.
It can happen when the system is idling, it has happened while gaming, it’s happened while working. It’s completely unpredictable. Since I’m working from home and my PC is my livelihood, I’m exasperated.
The system runs in Ryzen balanced power mode. There’s no overclock; all BIOS settings are optimized defaults, except for RAM timings (G.Skill pre-set profile 1) and fan curves.
I’ve changed out the RAM, disabled XMP, rolled back BIOS, flashed BIOS, tested every available BIOS from Asus, rolled back chipset drivers, etc. I’ve reinstalled Windows 10 a number of times. I put in an older GTX 1050Ti with the same results. I’m now on a fresh install (installed on 2021-01-05) of Win 10 Pro 20H2.
I’ve also replaced/RMA-ed the following:
I also recently switched cooling from the NZXT Kraken X72 (after TWO (2) RMAs!!), to the EK-AIO Elite 360. I contacted EKWB for them to check some of my temps, and they noticed some anomalies:
“I do see some weird voltage to your CPU, but I am not sure how is that possible as it looks like your core voltage is boosting over 1.475 Volts … I think you have the same problem as JayzTwoCents encountered in his review of 3900x where motherboard BIOS/Uefi settings are set way too high as it was the first BIOS version. … Your temperatures are actually ok for the massive Voltage that is applied to it. But I would still over them as it affects the longevity of the CPU. But the CPU is still young and we do not have data on the Ryzen degradation with time, but based on the older CPUs, high Voltage degrades the CPU and will start to boost lower and lower with time. And it affects the stability of the system overall if it is too high.”
I can’t believe that the newest Asus BIOS would still allow for this kind of core voltage? Or is this the CPU literally overpowering the motherboard?
I’m writing this while running in “power saver” mode, with voltages not going over 1.0V. It’s been running stable for 2 days. But since the restarts are so erratic, they might start here too. Plus running ‘power saver’ defeats the purpose of this CPU.
I have already started the process of RMA for the CPU and motherboard.
However, I am really concerned about the situation after receiving the new CPU and motherboard. What if these units will also eventually have the same issues as my current RMA units? What if this happens after the 3-year warranty period? I just have to buy the newer model?
This is my first AMD build, and the least I expect is it to run stable. The CPU/mobo clearly does not function 'out of the box' when installed correctly and with the most up-to-date chipset drivers and BIOS.
It seems to me that this might be a manufacturing issue; would all this not be grounds for a product recall, if it’s so widespread? Though I imagine that will only happen when systems start to catch fire… But still, is this not a consumer advocacy issue?
RMA-ing the same components over and over again is not a solution.
Any input would be appreciated.
update: the system now crashes and reboots after critical Kernel Power event 43 in 'power saver mode'. So I'm staring to doubt it has anything to do with 'massive voltages' applied to the CPU core voltage, since it never goes above 1.0V in power saver.
I'm still waiting for word back from AMD. I'm hoping for an advance replacement, since I cannot afford (literally) any additional downtime.
flash latest bios of your MoBo
have in mind: some MoBos require you to flash 1-3 other bios first before you can flash to latest bios...
ps AMD is right now releasing its new AGESA = there will be new BIOS within 1-6 weeks for many MoBos ;)
I've been updating BIOS religiously, with every new release. I'll check though if a manual flash via the flashback USB port on the mobo makes a difference.
I see Asus posted a beta BIOS with the new agesa, but I think I'll hold off until the final release; I'm not that brave :smileywink:
I guess I might as well try - I’ve nothing to lose at this point.
The system just rebooted during a file transfer, corrupting the ssd I was copying files to. I managed to repair and reformat the drive, and the lost data was backed up anyway, but it’s a distressing sign.
Anyhow, is it true that a core voltage boosting over 1.475 Volts is abnormal? It sometimes hits 1.5V max. No overclock, just optimized defaults in BIOS.
To illustrate, this is an image tech support from EKWB (on a different issue) sent back to me, underlining some voltages they thought were abnormal, considering my clock speeds are all running as expected.
those are totally normal voltages when you run your Ryzen "stock" - as it pumps voltage to raise clocks for LOW LOAD!
if you have same voltages at full-load (allcore AVX etc) then you have a problem - and will damage your CPU...
That's the thing though: support at ASUS and EKWB tell me these are abnormal. Getting conflicting assessments of identical values just makes things more difficult. On the other hand, I've seen screenshots with similar voltages as mine in other threads with AMD commenting they are completely within range, so I've no reason to believe they would actually be abnormal.
The main issue is the lack of diagnostic info to go on. All I get is "event-ID 41", without any minidump files, or other clues.
For my previous RMA I did get WHEA-Logger 18 errors, but now I'm not getting these either.
Hate to bring this back up from the dead but I am basically where you are at right now:
5950x - RMA'd
EVGA 3090 - RMA'd
PSU - swapped out from Corsair HX1000i to Asus 1200W THOR
Board - still on the same first one Dark Hero Crosshair VIII
NVME - 980 Pro 2TB (was previously 1TB Samsung 970 Evo)
4x16gb ddr4 trident z neo, running on stock and at XMP (3600mhz)
Same Event ID 41 kernel Power issues, tearing my hair out as this is my first personal AMD rig from intel, I've made numerous 5950x/5900x builds but this is the first i've owned. From the AGESA issues and now the WHEA issues, the only thing keeping me here is the 16 cores.
Sorry to hear that.
What fixed it for me was a second RMA of both my CPU (3900X) and my Mobo (Crosshair VIII wifi), something I would not wish on anyone. But I see you already RMA'd almost your entire system, unless you can do a full RAM test and make sure that's not the issue (though I seriously doubt that would be the culprit; my set of Neos passed all tests and I still got the kernel panic).
Though I see your mobo is still the original - so I suspect it might be the mobo. Have you used a different CPU with the same mobo before?
What helped for me was contacting Seasonic (after RMA-ing my SSR-1000TR PSU, after having swapped from a Corsair HX850) and EKWB (since I'm using an EK-AIO Elite 360 for cooling), since my temps & voltages were really high just idling. I sent them screenshots with readings from HWinfo and they actually got technicians to get back to me, noticing abnormal spikes in voltages beyond the expected ranges together with high temps (60C on 1-2% usage). The people at EKWB were really helpful, but I'm not sure what cooling you're using.
It might be worth a shot to send some data to the manufacturers/technical dept, especially if there's a history of previous RMAs.
So after another RMA of my CPU and Mobo it's now running stable at normal voltages and temps - the things is, I cannot tell if the mobo or CPU was at fault, since this was my first AMD system and I could not diagnose with other parts.
sorry for reviving this topic. I hope you already solved your problem!
I have the same error symptom, I already did the cpu rma, but the problem continues
the problem appears when i run some memory benchmark test
the stress tests go on for hours and hours and the error does not appear
can you guide me somehow?
Sorry for bad english..
if your running ram @3600 try 3200 ... depending on bios had problems like that ... you won't feel the difference anyway .. if you know it's a ram setting problem you can always tweak later then ...
try beginning with 3200 and stay like that until you're sure it's ok or not
if you don't know timings for 3200 use dram ryzen calculator
first 6 numbers (like 14 14 14 14 28 42 but less aggressive something beginning with 16) , then tfaw , and trfc and trfc2 , that should do it, the rest on auto
i have 2x4400 stick that runs @4400 (but only for let's say 1 day then computer begins to act weird
and another 2x4400 stick that don't hold 4400 on my cpu (ram controller depending on cpu ?)
4x4400 is not possible so i first configured them at 4x3600 , until one day i got the same problems as you after some bios updates (guessing)
i searched a lot and finally came to conclusion that the only REAL stable frequency is 3200 24/24h without tweaking voltages
and did not tried higher as benchmarks didn't show me any differences , even sometime better with lower frequencies but higher timings ... so now i run my 4x4400 at 3200 CL 14 without any tweaking because i prefer when my pc is perfectly stable and smooth instead of try to gain nothing with ram, and next time i buy ram , i won't give more than 100 euro for 16gb anymore lol
A late reply, but: I'm running G.Skill Trident Z Neo DDR4-3600MHz (4 x 8GB); I usually just set it to the preset on the motherboard, at 3600 with the timings pre-set in the profile. However, even when disabling the OC profile and setting everything to default 2133, the system still shuts down and reboots randomly.
I had a set of Corsair Vengeance LPX before this, and those were unstable, also at non-OC; but at least I could diagnose the issue fairly easy by event codes in the logger; it was quite obvious the RAM was the issue. All I have now to go on is event-ID 41, without minidump files, no diagnostic info whatsoever.
Anyhow, both Asus and AMD requested I sent back the board and CPU; I just received a replacement motherboard, still waiting on the CPU.
But I'll keep your comments in mind if the same issue were to crop up again (fingers crossed it does not!). I've never worked with dram ryzen calculator though, so it might be a bit of a learning curve for me.