Mainboard: MSI x570 Unify
Mainboard-BIOS: 7C35vA82 (Beta version)
CPU: Ryzen 5900x
RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2)
Drive: M.2 Samsung 970 Evo+ 1TB SSD
Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT
PSU: be quiet straight power 11 750w Platinum
OS: Win 10 Pro (64bit) - all updates installed
Chipset driver: 2.9.28.509 (released 2020-11-09)
I first assembled the PC with a Ryzen 3800x a week ago because it was unclear if and when I would get the Ryzen 5900x I ordered. Worked with the included AMD Prism Wrath CPU cooler for one week without any problems.
- Today I installed a Ryzen 5900x and a Scythe Fuma 2 CPU cooler.
- After 20 min the first crash/restart with the following entries in the Event Viewer: WHEA-Logger ID 18 and critical error Kernel-Power ID 41.
- Happens irregularly again and again, sometimes after minutes, sometimes longer: Windows freezes for a few seconds and then the PC reboots. Doesn't matter if load or not.
- CPU temperature between 30 and 40 °C
- Updated to BIOS and chipset driver mentioned above: Problem still exists
- XMP Profile disabled (RAM on 2600 MHz): problem still exists
- CMOS Reset: Problem still exists
Either there is a compatibility problem of something with the CPU, or the CPU is defective?
What to do? Really frustrating.
Solved! Go to Solution.
Im having a similar issue, x570 aorus and 5600x. Have same errors on windows.
Disable CBP and PBO and run it at default settings (3.7 ghz and xmp on). That works for me.
I got a new angle on this. So deactivating PBO and CBS definetely works, PC was running stable for a week now. But you'll loose performance.
So I wrote to the MSI support and the AMD support.
MSI suggested to try increasing the DRAM Voltage by 0.05 V, which I did. System seems to be stable, no crashes so far - neither in idle or while gaming.
AGESA 1.2.0.0. Still crashing
Still random crashing with latest AGESA.
And PSU idle is set to typical in CBS settings? Those are the most common culprits.
I've had my PSU idle setting at typical since it was first recommended. Only thing I've changed recently was disabling DOCP and manually setting ram. Set at 3200 as officially supported for CPU and fabric at 1600. Will see if random crashes continue.
Same boat here... pissed to have moved to AMD!
Issues with 5800x so replaced it as soon as I found a 5950x but terrible experience.
All stock was overheating. Had to upgrade all cooling system. Was running great for 2 days and now when gaming, random reboot with error 19 then 18.
Really tired of AMD!
My config:
+ 5950x
+Asus Crosshair VIII Dark Hero
+64 GB Crucial Ballistix Max 4400 that I have to run on 3600 with AMD CPUs... (3800 max) vs 4400 on Intel
+2x WD_BLACK SN850 2 To NVMe
+ Corsair 1500W PSU
+Asus ROG Strix 3090
Hi!
if your CPU is stable after run some stress test try with set PCIe to 3.0, sometimes the GPU fail in auto.
Wow, incredible setup. Good news, you do not need to buy anything! Just make adjustments. What GPU do you have?
My 5900X is working fine after tracking changes I made in the BIOS & their effect on performance. By monitoring voltages/current on the cores, memory, SoC, etc. board & eliminating Windows power/chip management controls. When would we start blaming the board manufactures not fully understanding the Zen3 architecture. Just a thought from reading a lot of people who bought the exotic hardware who are having problems that are all Kernel-Power before WHEA. (please confirm you have a Critical Kernel-Power event ID 41 error 10-20s before the WHEA event ID 18 error in the main log.)
Which ASUS BIOS are you running? (The current BETA with USB power fix works fine, the USB issues are still showing up after 12 hours on my ASUS Strix B550-E; my computer is on from 6AM to 12AM during the work week how I know that...
For example, not listed on the supported RAM sheet, I have 64GB of Crucial RGB Pro CL18/3600 confirmed BDie, therefore they need to run over at 1.4 volts on the ASUS board (I am not an engineer I am just telling what has made my system work. 1.35 to 1.5 is where other have ran 2 to 4 sticks. of BDie on the ASUS boards X570/B550; I have a Prime X570 Pro that my 3900X was in and then my 5900X before I wanted something new, hence the B550-E); I am running the DRAM at 1.45v and the FLCK set to 1800/3600; timings are auto set & very loose as you can imagine but it works, it is fast, and haven't messed with that at this point.
Have you made any PBO/PBO2 (in AiTweaker vs Advance menu) changes or is everything on AUTO besides your RAM adjustments?
Texas!
In my case for now CPU are stable but GPU crash when play mostly.
With this setting CPU go to stable
Enable PBO
Set power limits to Motherboard.
Set curve offset to 0 for all cores.
Set PBO boost override to 0.
Leave PBO Scaler at auto or start with a conservative 1x or 2x setting (you can raise this later if you wish but I have not found it to make much difference)
My FP are old and Power Supply Idle Control in the BIOS, from Auto to Typical Current Idle
PCIe set to Gen3 but GPU still crash
and my mobo is a MSI MPG X570 Gaming Pro Carbon Wifi Rev 1.0 Bios 1.C2
After contacting AMD tech support, below is the full list of troubleshooting they advised me to try, maybe one of the other solutions will work for you:
Update the system BIOS to latest version available from motherboard manufacturer (refer to motherboard user manual for instructions on updating the BIOS).
Set the BIOS to use factory default settings / optimized default settings (refer to motherboard user manual for instructions on restoring BIOS default settings).
In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
Update Windows to the latest version and build via Windows Update. For instructions, refer to article.
Update to latest chipset driver from AMD. For instructions, refer to article.
In Windows Control Panel, select Power Options and choose the Balanced (recommended) power plan. In Windows Settings, select Power & sleep and set the Performance and Energy slider to the middle.
Disable non-Microsoft services and startup items using the System Configuration Tool.
Reseat CPU, RAM, and all PSU power connections (end-to-end for modular PSUs). For more instructions, refer the product’s user manual.
Verify RAM sticks are installed in the correct DIMM slots (for socket AM4 motherboards with 4 DIMM slots, use A2 & B2).
https://support.microsoft.com/en-us/windows/windows-update-faq-8a903416-6f45-0718-f5c7-375e92dddeb2
I'd contact AMD if you're still having issues.
Thx for your reply! Yep, we can see it like that!
That is part of the reason I am so frustrated...
My GPU is an Asus Rog Strix OC 3090.
You are right, I got an error 41 exactly 12 second before error 19 than 18.
My bios is the 3401, last available on Asus website.
I am running all stock but RAM setup since I had to downgrade the 4400 1.4V to 3800 with FCLK at 1900.
Note that I can let it run in all benchmarks for hours with no issue. (OCCT, Cinebench, etc...)
That unexpected reboot happened when gaming (Battlefield 5). Highest temp by then was 79°C.
I found on another forum someone saying that can be due to AMD CBS/PSU to be set at 'low current idle' in place of 'auto', what I just changed, but pessimistic about that one since I believe that to be related to idle period, what is not the case when gaming such an FPS.
I did not change anything else like C State or other... so any advice will be welcomed!
Many thanks
I never had any sudden reboots while playing games, I did have a bunch of game crashes but that's Rockstar allowing hackers to crahs games
See if it's happening in multiple games.
I did have a crash immediately after quiting a game and the pc going to zero usage but only happened a couple of times, the rest was at idle and my PSU wasn't a recent design.
Your ram is a very high speed is it one of the QVL models for you MOBO? RAM speeds aren't always garuanteed.
Do you have more details on the error 41?
Many issues could cause this same result, the PSU idle thing was new to me though.
@Fastmikefree I wrote you this incredibly long response with all my settings etc. and my bios setup...and its gone. I didnt save it like an idiot either. This board has a moderator problem.
hi, I didn't catch up on the whole thread, so sorry if you tried this, but
try manually overclocking the RAM and see if you can find something stable.
I had reboots at idle, WHEA 19 errors, and kernal 41 errors as well. My RAM is 3600mhz via XMP profile.
I disabled XMP, then slowly worked my way down until I hit a stable config - 3200mhz with timings accordingly. 1.36volts. Haven't had a reboot in two days now. WHEA errors are gone.
I should mention, this configuration was stable for nearly two years at 3600mhz (running a 3700x). I did update the BIOS, but these problems did not appear until almost 3 weeks later. Maybe coincidental, or bug in the BIOS/AGESA that took some time to appear. I don't know...I don't want to keep flashing the BIOS to find out, since I am stable again.
My suspicion is that FCLK over 1600mhz is causing it to happen.
I can also report the power supply idle control at 'typical' did not work for me. It seemed to at first...but whea 19 errors and reboots started up again after a few days. So for now keeping the RAM at 3200/FCLK 1600 until there is an MSI bios update.
@BillAdama I wouldn't wait in hope of a BIOS update unless you're sure one is coming that will fix the issue.
On page 68 I posted a list of the full troubleshooting AMD asked me to do, if none of that works it might be worth contacting them to see what is the next step.
I'v seen several things fix the issue, setting PSU to typical idle voltage, disabling C states, both fix in a similar way, then I've seen overclocking/underclocking (which can also disable C states) and BIOS updates which seem to point to smoothing out power balance etc, and finally some people have solved by replacing the CPU via RMA or via store return policies.
Assuming all other troubleshooting has been completed, All of the above fixes except BIOS update and CPU replacement might only be masking issue. CPUs should be stable, I've never had this issue in all the PCs I've built before but others have even with Intel, so It could just be very rare, I wish I knew the RMA numbers.
Even though I haven't had the issue return in weeks after setting Typical Idle Voltage I still wonder if the issue is my PSU or CPU, I won't know till I swap one of them out, I'm hoping it's PSU but for the next person It might not be.
Anyway this is a hardware issue and a BIOS update might fix it but if all the troubleshooting has been done then the real fix is to identify the faulty part and swap it out, I doubt all MSI MOBO users are experiencing this we'd probably have heard news about it, if it was that widespread.
Goodluck I hope you get to the bottom of it.
@Cmdr-ZiN Thanks, I did try those suggestions with no improvement unfortunately. Thank you for putting that list together
In fact MSI are aware of whea 18/19 errors as you can see here, and also the aforementioned RAM/FCLK instability above 3200: https://forum-en.msi.com/index.php?threads/msi-x570-b550-beta-bios-update-bug-status.348919/
Now I am running b450 and the 3700x. However the problem appeared for me after new BIOS were released to account for 5000 series chips. I would bet the causes are the same regardless of cpu. In that thread MSI say to await a BIOS update which is forthcoming: https://wccftech.com/msi-amd-ryzen-agesa-1-2-0-2-bios-firmware-for-x570-b550-motherboards/
I could rollback to an old BIOS, but I am also ok running stable at 3200mhz for now...and waiting for that BIOS update to come that hopefully solves it, since I don't want to flash it any more than I need to. Plus the performance difference is very small.
@BillAdama wow awesome info, that brings a lot of light on the subject, I'll have a read. Yeah diffinitely wait for BIOS update or roll back in your case.
I have current latest update but I might need to try putting PSU idle control back to Auto after the next update sounds like they still have some work to do
However I can't confirm but based of my calculations under 6W power draw may not be stable for my PSU. I have my eyes on a PSU that can handle a 12V 0A rail when I have time and money, but I might just wait until after the next BIOS update for curiosity
"Are all you guys with WHEA reboots running BIOS with AGESA 1.1.9.0 or 1.2.0.0?"
AGESA 1.1.9.0 - still crashes.
"any word from AMD on all this?"
There are no words from AMD. They do RMA such CPUs - and for most people, the 1st replacement works just fine.
But some 'lucky' people got 2 defective CPUs in a row - so they need to replace the CPU at least twice to get a working build.
Is AMD honoring RMA's for those of us with WHEA crashes? I imagine the process is long and pain in the butt.
This is the most frustrating PC experiences I've had in 30+ years of PC building. How is it possible that I can pass memory tests for 7+ hours, run CPU burn in tests and benchmarks for hours without any errors or thermal throttling, but idle time reduces my system to a drooling rebooting failure machine?! How can AMD not have a clue how to fix this? I've tried all the fixes people have been talking about - they may make the issue less frequent, but nothing has fixed it permanently:
I have spent DAYS testing these settings in different combinations and one at a time and have not been able to completely stabilize my Asus Dark Hero / 5950x PC. I watch daily for new BIOS and drivers and update them as soon as they are available, but no fix yet.
I'm really reluctant to exchange the CPU, because it seems like the people saying it fixed their issues many times just didn't give it enough time to fail again. I don't want to wait days/weeks to get a replacement to end up in the same place.
Let me know if I've missed something obvious.
One thing that definitely works in every case I've seen is disabling CPB and setting a manual all-core overclock. This will hurt your lightly-threaded performance a bit of course. But I've got to say, in all the posts I've seen where that worked, upgrading to an AGESA >=1.1.9.0 fixed it too. But worth a shot.
I'm feeling your pain. This was the first time I've had the opportunity to upgrade my system in a long time. I was excited to come back to AMD given the fan fare about the new CPU's but it has been a **bleep** fest. I'm really trying to hold out for a bios fix but with this unreliability caused by instability is really wearing my patience. I'm afraid to do anything of consequence only to be met with a blue screen. Thinking I should have stayed with Intel.
Completely agree. Given how insanely difficult it was to buy a Zen3 CPU in the first place, it's a real kick in the pants dealing with instability. I finally fixed mine (I think, still got fingers crossed) but I would have bought Intel if I knew what a pain in the butt I was in for. Intel you can click "buy now" and have it shipped to your door next-day, and it will just work.
Yep, I feel your pain. I cannot get a working build for 2 months already. I’ve just sent my 5900x to the service again and rebuilt X99 with 5820k which worked and works without any single bsod for 7+ years.
But, unfortunately, Intel has nothing comparable to 5900x-5950x now. When I see how my 12-core 5900x do almost the same job as Intel’s 18-core 10980x - it gives me a strength to keep fighting for the stable build from AMD
So the solution is to replace the processor?
"So the solution is to replace the processor?"
The best solution is the replacement for sure. Else - you will need to give more power to the CPU in BIOS to stabilize it or this amount of power will be automatically added in the next versions of BIOSes and the issue will be solved just by an update.
I will wait for an update. I don't want to make something in the BIOS and lose the warranty. The stock at the local store is low and I think they will not give me another CPU. Thx for your reply!
IS! This problem is too annoying.
I took a new WHEA this weekend. The problem is that there is no time to generate the dump.
I am trying to force the error with OCCT v.7.30.
Before, I set the CPU VCORE to 1.30v and 1.35v memory.
In my case, I don't think it's CPU and Memory. CPU worked correctly on an MSI X370 as well. The memories were exchanged and I bought the ones that are in the QLV of the current mainboard.
I only have 3 considerations I could make. In order. Source, VGA and Mainboard. However, Fonte has always worked correctly with me in previous setups. The current mainboard behaved well until inserting the VGA. There were closing errors in games. After AGESA 1.2.0.0. The scenario has stabilized too much, despite receiving the WHEA at one time or another. I am considering that this may in fact be a problem with the BIOS microcodes. We have cases here that changed everything and still had problems with the new system. Another likely possibility is the chipset and VGA drivers. I had a WHEA record on a USB 3.0 hub.
AMD needs to do something about it. This forces the client to MANY false positives. I've used AMD since the K5. Honestly, the level of quality is still not as expected. I don't like Intel, but something will work. You stick the CPU and use it without a headache. It can be 14 ++++, consume high loads of W, but the pet works steadily and rarely have bizarre problems like the ones we are facing.
I am rethinking about the future. AMD is not yet what we want it to be and I consider it to be silly. We need the system to work steadily and we go through these things.
So, let's make the problem public through technology channels.
I will try to do that now.
AMD must keep an eye on their own forums. Surely someone has passed these issues up the chain. Are they commenting about peoples issues anywhere? Nothing worse then silence from a company when you are having problems with their product.
I had information from others with RX5700XT VGAs that they solved their problems in two ways:
1- With driver updates;
2- Getting rid of VGA and purchasing an nVIDIA.
I was recommended to activate the VGA support or return it, as AMD until the present moment had not solved the problem of RDNA1. I was kind of worried about that.
Are you guys running HWINFO in the background? If so, close it, dont use it. It has been discovered recently that this monitoring program is causing these whea logger errors an reboots. Look at Jackalito post right here
Is HWiNFO causing the WHEA-Logger Event ID XX Cach... - AMD Community
"Are you guys running HWINFO in the background? If so, close it, dont use it. It has been discovered recently that this monitoring program is causing these whea logger errors an reboots. Look at Jackalito post right here"
I useded Portable version of HWInfo. In my tests I did not see any correlation between WHEA errors and HWInfo. I mean I was able to reproduce WHEA errors easily without running HWINfo.
I'm talking about random restars in idle/low load, resulting in a WHEA-Logger ID XX error in the Event Viewer.
This issue caused by hwinfo has been confimed by several users in a spanish forum.
Hmm, I've had WHEA crashes with HWinfo running and not. I'll stop using it for a while a see if I get the same results.
' Are you guys running HWINFO in the background? '
no, I never used that...
At this point I'm wondering if there's a risk that either RAM or even the PSU could get damaged by this, so I just shut down the computer and asked Amazon to return that garbage.
Also, an advice for those who are served by Amazon in their countries, always buy this kind of stuff when it is either sold or at least shipped/fulfilled by Amazon, it will be much faster to return a defective product and have it replaced.
Intel is garbage but at least their products work. All I've learned from this experience after 15 years of Intel-only builds, is that AMD is as garbage as Intel from a consumer perspective, but you also need luck for their products to work fine.
I saw what you mentioned here. And I did some more research. It looks like just open the program, right? It seemed to occur in BETA and FINAL version conflicts. Very strange. I don't use the beta in this version. soon, there would be no conflict. I will try to replicate the failure with him. I'll leave him idle. However, I recently upgraded to version 6.42. Let's see what happens.