Mainboard: MSI x570 Unify
Mainboard-BIOS: 7C35vA82 (Beta version)
CPU: Ryzen 5900x
RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2)
Drive: M.2 Samsung 970 Evo+ 1TB SSD
Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT
PSU: be quiet straight power 11 750w Platinum
OS: Win 10 Pro (64bit) - all updates installed
Chipset driver: 184.108.40.2069 (released 2020-11-09)
I first assembled the PC with a Ryzen 3800x a week ago because it was unclear if and when I would get the Ryzen 5900x I ordered. Worked with the included AMD Prism Wrath CPU cooler for one week without any problems.
- Today I installed a Ryzen 5900x and a Scythe Fuma 2 CPU cooler.
- After 20 min the first crash/restart with the following entries in the Event Viewer: WHEA-Logger ID 18 and critical error Kernel-Power ID 41.
- Happens irregularly again and again, sometimes after minutes, sometimes longer: Windows freezes for a few seconds and then the PC reboots. Doesn't matter if load or not.
- CPU temperature between 30 and 40 °C
- Updated to BIOS and chipset driver mentioned above: Problem still exists
- XMP Profile disabled (RAM on 2600 MHz): problem still exists
- CMOS Reset: Problem still exists
Either there is a compatibility problem of something with the CPU, or the CPU is defective?
What to do? Really frustrating.
Solved! Go to Solution.
Im having a similar issue, x570 aorus and 5600x. Have same errors on windows.
Disable CBP and PBO and run it at default settings (3.7 ghz and xmp on). That works for me.
I got a new angle on this. So deactivating PBO and CBS definetely works, PC was running stable for a week now. But you'll loose performance.
So I wrote to the MSI support and the AMD support.
MSI suggested to try increasing the DRAM Voltage by 0.05 V, which I did. System seems to be stable, no crashes so far - neither in idle or while gaming.
Stayed fixed for me too -- I've been running for almost 2 months now rock-solid and completely stable. No more WHEA logger errors. Hope everyone else finds a similar resolution.
Notable from my solution was a BIOS update which seemed to set my Gskillz Tridentz memory timings at 1333mhz as opposed to the 3200mhz which they are labeled as
@kneel420 that is interesting, it really is starting to look like a RAM/CPU combo.
The lowest mine went was 2133Mhz, this I think is stock on a 5800X. A different CPU might be lower. I wonder if I lowered my RAM more back then if it would of helped.
If this setting is below default spec then you probably are entitled to replace the CPU if that's the part holding you back.
You could try CTR 2.1 and hit diagnose, it'll tell you the quality of your CPU, but don't know how accurate it is, it's more how good your chip is for overclocking or undervolting. Also it may not tell you how good the memory controller on the CPU package is. However maybe worth a shot if you're curious.
It looks like the fix might be higher quality CPU and RAM, and a good test would be to keep lowering the RAM speed until it's stable. As well as the usual BIOS and chipset updates, I'm sure they help. I'd love to see if this helps others too.
Anyway if you're happy with it then I hope it stays stable for many years for you
I am running g.skill trident x 4 sticks in the Asus TUF X570 motherboard with a 5950x. The sytem runs for 1.5 to 3 days at a time and then typically overnight when idle gets the WHEA 18 error restarting.
This is a huge problem as this is for a project and not gaming at all. Losing money on this issue fast.
Currently I am testing the optimization curve +5 volts fix. I need to test a benchmark before and after doing that if it stays stable for a week or more. Does anyone know how much this handicaps the system?
Can anyone say they have found the definitive fix for this terrible issue?
@cubic777 what freq is your mem running? A motherboard bios update (aorus master x570) fixed it for me. My gskill trident 3200mhz ram is at 1333 MHz and XMP disabled in bios
It is 3600 and I have tried that via DCOP on the asus board and the lower 1.6. I updated bios about 4 weeks ago to the latest. I can check again, but not sure that Asus has changed much since then.
Currently I am testing a fix that has been mentioned of changing the Curve Optimizer to +5 volts. I am not sure how much that slows down the system. So far it has run 8 hours but my WHEA ID 18 error typically takes 1-3 days to happen. It is so bad, that I even considered writting a python that just stresses the processor out all night so it is never in idle.
If this works, then I have to go to the lowest volt increase that does not create the issue. Seems like a waste to buy top of the line only to bottleneck it so it actually runs. If not, I might try your motherboard or an MSI.
Make sure EDC is set to 140.
Make sure none of the cores are too far negative in CO.
Don't set a vcore offset.
Make sure VDDP is 1.03
The rest might not matter, but.. set SOC to 1.16, VDDP 1.03, CCD 1.1, IOD 1.125, PRocODT 44, Cadbus 24-20-24-20, and your RAM voltage about 1.45. Set tCL at about 16, tWTRL 9-12, (see ryzen DRAM calc), RTT_nom RZQ/7, RTT_wr RZQ/3, RTT_Park RZQ/2
Sometimes BIOS gets invisible corruption and I've had to flash it and restart twice. I've also had to replace CPU twice.. The first chip would reboot and freak out. The second chip would run but some of the cores would try to burn, cores reset a lot. Third one I get a few core resets but then my loads stabilize and all core up to 4640 mhz SSE LARGE in PRime95 torturetest )AVX and AVX2 disabled).
@cubic777 you can try to tweak it with overclocking and undervolting settings until you stabilize it, however I recommend you get it running stable at stock first before messing with any performance settings. It's possible you might never get it stable and you really need a baseline to work off when tweaking things.
I believe some Ryzen CPUs have stability issues, I believe this is made worse with RAM selection which contributes to more instability. I suspect the most stable RAM would be RAM on your MOBO QVL, under 3200Mhz (all Ryzen is rated for) and with AMD compatible on the sticker.
My first CPU was unstable no matter what but we now have newer BIOSes and chipset drivers.
My second 5800X CPU was great for months until it became unstable and died a week or two later, completely stock settings with DOCP off.
My third 5800X CPU crashed a couple of times when I installed it within 48 hrs, I was already on the latest of everything except the chipset drivers, these don't auto update you need to manually update them. Since updating to the latest chipset drivers I've been unable to make it crash as much as I try, I have tried every burn in test and left it idle for many weeks. I am now on DOCP 3600Mhz CL16, that got removed from my MOBO QVL for the 5000 series chips only, but added later to the QVL on the G.Skill website when it wasn't originally.
I recommend, updating everything in Windows, updating your chipset driver (this is important) and then clearing your CMOS, also important. Leave everything stock, don't enable DOCP your chip may not be able to handle it and it's considered to be overclocking your memory controller on your CPU and a lot of people have difficulties with 3600Mhz+.
CTR 2.1 software has a diagnostics feature that will tell you the quality of your chip, I don't know how accurate it is. Others that have replaced their chips and gotten a stable system have a silver sample, mine is a golden sample apparently. However I suspect this is more rating the cores and not the memory controller.
Try some burn in tests then leave idle for 2 weeks, if you still can' get it stable then you have hardware that needs replacing.
It will most likely be the CPU at fault but slower more stable RAM I suspect can help a lot. See if you can borrow parts from a friend, becuase it could be anything, it could be your PSU, MOBO, GFX, RAM or CPU from what people have replaced on various forums. There could also be other issues. However a lot of people will come back later saying the fix hasn't worked.
Even many people that have swapped their CPUs have still had the issues but it seems either the CPUs now are better or the drivers and BIOS are because when people swap their CPU they're having more success.
@cubic777 - Nothing can fix it. But there is a workaround, which is the fixed clock usage. Nothing else besides it.
This error is a ZEN 3 Design flaw that AMD refuses to talk about. Everyday we have new reports of it.
I've created a thread describing the behavior, the workaround and the inevitable RMA that I did to finally get rid of this defective product. In my case, 3 different systems were tested (including different CPU, RAM Modules etc) and with PB or PBO, the system would crash while idle, no matter what.
My suggestion: Set fixed clock or RMA it, there is no other way. And take care with the AMD fan boys around here, they will suggest a bunch of useless configuration and will also consider that some of your parts are the problem, not the CPU itself, which is not the case, considering the MASSIVE number of cases.
Made RMA in may this year and now I'm stable with a Intel system. From 11/2020 to 05/2021 I've suffered with WHEA errors. AMD is not a decent company, simple as that, they are selling defective products.
Take a look:
Ryzen 5000 Crashes (WHEA Errors) will get a "Silen... - AMD Community
Wish you the best!
Yea I don't know if that is true. Its not a design flaw, its just a voltage problem with some cores. Fixed mine about a week or so ago. Been rock stable ever since.
@crayraven - A week is not enough to confirm stability using PB/PBO.
Considering the core numbers, the system could take more or less to crash.
When I had the CPU, I could note that:
5900x - 6 Days to crash with PBO | 30 Days to crash with PB/CPB
5950x - 4 Days to crash with PBO | 14 Days to crash with PB/CPB
5600/800x could take even longer to crash due to their core numbers.
No, its stable. I know so because other than PBO everything else is on stock. Which would cause it to blue screen within 15mins of gaming or being idle. Its a voltage issue, at least in my case. The solution for me was simply enabling pbo and turning down the ppt, edc and tdc. I believe my settings are 220 ppt, 150 tdc and 150 edc. Other than that, I did not increase the voltage to the cpu at all. Its still on auto.
Also my pc has been on for almost over a week.
@cubic777 mind sharing the exact model number of your G. Skill Trident? G. Skill asked me to RMA my ram, and I'm going to do that now that I've switched over to Kingston HyperX.
I'm wondering what's your JDEC value for the ram, and whether your ram is rated "AMD Ready"?
@Electric_Squall well obviously some things can fix it. It's most definitely an issue between the memory controller and the RAM timings.
I think a lot of us are using ram that's rated at 3600 for Intel. This ram has a JDEC of 1000, and just doesn't allow the Ryzen 3 to run stable with many cores. I've had zero issues after dumping the G. Skill ram in favor of Kingston's HyperX that are "AMD Ready" and have a JDEC of 1200. The ram is rated at 3200, but runs just fine at 3600 with 4 sticks on a 5950x that used to crash every 2-3 days (or 3-4 hours with XMP enabled) when running at 3533 (wouldn't even run at it's rated 3600). G.Skill asked me to RMA it, even though it ran fine using XMP under a 3700x and passed ram tests.
If you're constantly RMA'ing your CPU, maybe it's not just the CPU's fault. Have you tried different ram aside from G. Skill? I'm thinking G. Skill might be really crap ram for AMD.
So far with the Asus TUF x570 optimization curve set to +5 it has not rebooted. In a benchmark, it makes the CPU run 0.38% slower which is not significant. But it has only been a few days. In the past it would last about 2-3 days before the WHEA 18 disaster. Another ridiculous thing that I am trying, is coding a python process that keeps all cores busy at 15% utilization when I am doing nothing. That is so duct tape, but is possibly working. It is hilarious that we need to do all kinds of magic to get these chips working.
I just had an invalid HTML error on my post here, the forum then auto corrected it removing the HTML issue. I then used the post button again with the corrected formatting. The site then threw a post flooding error at me falsely believing that the last post went through. I now have to wait ten minutes to post because it thinks I am flooding when I didn't even post anything today. I will return in ten minutes and post this, but maybe AMD could fix that code error on their website and the this chip issue.
Had same issue on new install on an asus b550F,10s of reboots a day with the only clue being the whea logger error in event viewer, giving us fk all to work with except second guessing the durability of other components... what worked for me is undervolting the cpu to 1.3v (mobos default was on 1.45 or something like that) and disabling global C-state,, every other option is untouched. haven't had a crash since .. im no OC nerd , didn't even want to go into bios but had no other option, i would except a 500 euro cpu would work out of the box.
So far, I have had 6 days without the error after increasing the optimization curve to +5. About 1.5 days longer than the typical restart. We'll see if this keeps working, but so far it is solving the WHEA 18 issue without any changes to RAM which points back to CPU quality.
Typically it would WHEA 1.5 to 3 days. I am now on day 8 without the WHEA. All I did was keep DCOP on with 3600mhz and change the optimization curve to +5. I ran passmark CPU tests 3 times before and 3 times after the change and subtractracted the average before and after performance. It costs only 0.38% to run at curve optimizer +5, in other words nothing. So if this keeps stable, this was the fix for my system. I don't care about that very nominal performance as the system is already in the 99.9th percentile.
Mine is fairly stable as well with the undervolting. In fact havent managed to crash it yet with any sort of abuse i throw at it... Looks like each mobo & ram combo + the silicon lottery needs to be fine tuned for stability.. oh well , this is wut we get for bleeding edge hardware..
Id still rather burn out 5 chips before even considering the other "company"
It made 1 month yesterday, since I've RMA my 5900X, which was giving me really hard times with constant random reboots WHEA error on idle or simply watching YouTube videos. Tried to fix with every single possible settings with PBO.
Since new CPU has been installed, I've been on CO -30 all cores (offset -150) and not a single WHEA error since 20AUG.
For the record, I am using 64gb 4 sticks of RAM T-force ARGB 3600Mhz CL14, which at that time with old CPU, I did even tried to run only 2 sticks and XMP OFF thinking it could be related with memory controller.
So far so good.
So eventually after 10 days, I had the WHEA 18 again with +5 on the curve optimizer. Just changed it to +6 and we'll see how long it is stable this time.
I had assumed everything was ok, after over a month of no issues.. then suddenly. WHEA error. I pushed back the settings. Another one a few hours later. Turned to stock A-XMP settings, and bam, another WHEA 24hrs later.
At this point, I have to conclude the CPU is crap. I'm contacting my supplier for an RMA, or failing that, will directly RMA with AMD. I feel like there's grounds here for another class action lawsuit. Kinda sad that every new CPU release by AMD is followed shortly by a class action lawsuit. My old FX cpu, with zero problems netted me $60 in the class action, just because someone didn't like AMD's definition of cores. But for all the crap it got, I still have that old FX machine humming along, warming my feet as a beefy little nas.
Feels like the 5950x was just hastily thrown together to give Intel the finger. Unfortunately, we're literally the victims here.
I had terrible trouble with 2 of these CPU's installed 5900x and started receiving WHEA errors every day, tested every component with every benchmark i can think of OCCT for PSU as well as paper clip test. Memtest 86 as well as windows memory diagnostic. FurMark and 3d Mark for Graphics Card as well as Seatools for HDD.
Also tried increasing voltage on my ram and tightened up the timings. Tried PBO and XMP off still received WHEA errors. I RMA the first CPU and was shocked that this issue returned with the replacement. Also tried 3 different BIOS revisions. Since returning the 2nd CPU the Motherboard manufacturer Gigabyte has come out with a new BIOS Revision. Unsure if this did anything to improve the situation but stability on this CPU is awful and i cant keep wasting my time with this. I am thinking ill just wait for AMD to get their **bleep** together or if they dont I will just switch back to Intel.
I will definietely be more careful on my next purchase and keep checking in here to see if this issue is resolved.
@authorized to ill
I went into "all that" because not everyone is as computer savvy as yourself.
However, since we are dealing with all types here, I will make it more simple for you.
You should be able to translate this into the necessary steps.
1. Boost voltage to both the cores and the Internal memory controller by the smallest possible increment
2. Lower power to prevent the boosting to the highest frequencies.
3. Cap temperature to what ever you are comfortable with.
Oh and by the way, you are not running at stock settings. You are running at the default settings supplied by the motherboard manufacturers. If one wants to run in-spec, they would turn off CBP, PBO, XMP and DOCP.
There are numerous types of people out there: Those that need help, Those that can offer help, and those that want to complain and snicker at those attempting to make things better.
Ok, looks like mine (5950X) is become stable (until now). What I have done is:
1) Go to BIOS and 'Load Optimized Defaults' (note that CPB is Enabled by default and I leave it enabled);
2) PSS Support -> Disabled;
3) Global C-state Control -> Disabled;
4) Power Supply Idle Control -> Typical Current Idle;
5) Power Down Enabled -> Disabled (for DRAM);
6) Gear Down Mode -> Disabled (for DRAM);
7) Set XMP to Enabled for the DRAM (because my DRAMs support XMP 2.0);
I have been testing the system for 5 days (24h/day of calculations, on all 16 cores) and until now I haven't had WHEA 18. Hope this will help other people.
My 2 cents,
To those who have tried everything else, here was my fix.... unplug all of the fans from the motherboard. There is an amperage rating to for the fan headers on motherboards. Apparently overloading this circuit can cause stability issues elsewhere in the PC.
I had this same random rebooting/stability issue for a few months and replaced every single component in the PC trying to fix it, including a new processor and motherboard and spent about 40 hours adjusting BIOS setting and firmware. The last component replaced was the power supply... I swapped out the 1000W power supply with a "known to be good" 750W power supply... the computer wouldn't boot with the 750W even though online calculators indicate a power supply as small as 350W would work. The computer was already stripped down to the bare minimum for diagnostics (1 video card, 1 stick of RAM) so my only option to reduce power draw was to unplug the extra case fans. With only the CPU cooler fan plugged in, the computer booted right up and had no stability issues.
With the computer stripped down and all 5 case fans plugged in, the computer is unstable.
With all components reinstalled in the computer (all 4 stick of RAM, 2 graphics cards) but with only 1 case fan in lieu of 5, the PC is stable and has not had a single random reboot in 6 weeks.
I just want to share my experience facing exactly the same issue you guys are facing.
My setup is
CPU : Ryzen 9 5900x
MOBO : MSI MPG X570 Gaming Plus
RAM : PNY XLR8 DDR4 8GBx4 3200MHz
GPU : Zotac 3070ti Amp Extreme Holo
PSU : Thermaltake Toughpower GF-1 1000W
AIO : Silverstone PF360
I've faced exactly the same with those replies before me. My PC works perfectly fine most of the time but when I was playing games, whenever I'm on the low loading scene like in the lobby for multiplayer games, my PC crashes. It didn't throw me the BSOD but only black screen or the last scene rendered by the GPU. I was annoyed a lot since I've spent a lot on the PC and also this is the first built PC by myself.
I went to Event Viewer and found out that it gives me the Kernel-41 everytime and I have no clue what had happened. I've tried methods mentioned here in the thread but I just don't want to sacrifice any performance for a stability which we should have got without any effort after buying AMD chips.
Turns out I've found out that for my system if I increase DRAM voltage to 1.42 it fixed the random reboot under low workload. According to the ram specs, it shows that those RAM uses 1.35v for operating. IIRC maximum voltage for DDR4 is 1.5v so I think raising 1.35v to 1.4xv shouldn't be a problem and this solves my issue.
btw everything else except the fan curve on BIOS I've set them all at default settings.
Hope this would be helpful to you guys.
PERIOD . = BACKSLASH
COMMA , = FORWARD SLASH
TAKEOWN ,F C:.WINDOWS.SYSTEM32.WHEALOGR.DLL
ICACLS C:.WINDOWS.SYSTEM32.WHEALOGR.DLL ,GRANT ADMINISTRATORS:F
REN C:.WINDOWS.SYSTEM32.WHEALOGR.DLL WHEALOGR.DL
THAT TAKES CARE OF THAT, NOW FOLLOW THE 'RUSSIAN GURUS' GUIDE AS WELL AS USE CPU CORE PARKING UTILITY, TURN CORE PARKING TO 0%, ALL OTHERS 100%. THIS COMBINED WITH:
With great help from some Russian gurus I finally found (I hope) a solution for my case.
Just to remind you, my problem specifics were crashes in transitions from loads to idle, in idle within 30 seconds after a transition, or when applying a load again after these 30 seconds. NEVER under a load. Turning the Core Performance Boost off eliminated the issue together with the CPU performance. If you're in the same boat, try this, it should help.
The following is for the ASUS bios, for other vendors the same parameters may be hidden in another place.
The system is stable so far with the following BIOS settings:
Go to AMD overclocking, set the Presicion Boost Overdrive to Manual. Some additional parameters will appear. In there:
(The main thing) Set the EDC current limit to 200A.
(Just in case) Set the power limit to 130W.
(Just in case) Set the temperature limit to 83C.
1 is an increase, 2 and 3 is a decrease. Leave at zeros all the rest there.
Also, just in case, set Idle Voltage to Typical, Global C-states control to Disable, check that ECO mode is Off. Then you can set Core Precision Boost back to On, everything should work.
Looks like the MB and its BIOS wasn't tested with a 5000 CPU at all (or, if it was, it was like "Ok, it boots, that means it works, great, the job's done), and the BIOS just doesn't know about the larger peak currents of Rysen 5000s, and the BIOS' "digital fuse" is just too small for a new CPU. When changing its clocks the CPU, probably, tries to draw more current, the "fuse" (EDC current limit) kicks in and the CPU malfunctions and produces a BSOD.
These currents (or how the "fuse" works) also definitely depend on the MB and/or the CPU heating (I didn't have any BSODs when cooling the open case with a cold hair fan), that explains why not everyone with the config like mine has the same problem, people with better cooling (or a colder GPU) might be ok at defaults.
I SHOULD WORK FOR AMD
For WHEA errors, please check the following KB article for suggestions on how to troubleshoot your issue: Troubleshooting Tips for Resolving System Stability Issues | AMD
Some other suggestions for WHEA errors.
1. Ensure you are on the latest BIOS for your motherboard.
2. Please check to see if using the motherboard BIOS default settings (including memory running at JDEC specs 2133Mhz) resolves the WHEA error.
3. Check that the system memory you are using has been validated by the motherboard manufacturer on the QVL (Qualified Vendor List) and that the system memory configuration you are using, eg 2/3 DIMMS has been validated to run at the memory frequency you are using.
4. Check that you are suppling sufficient voltage to the system memory for your memory. Sometimes a slight increase to memory voltage may be suggested by the motherboard manufacturer to obtain stability, eg from 1.35v to 1.36v.
5. Check that your Windows OS and Chipset drivers are up to date.
6. Check that no components on the system are overclocked or are overheating, this includes graphics cards.
7. Check that your PSU meets the requirements for your system.
8. Do not use PBO to overclock the processor (or any other component in the system).
Since the OP of this discussion has a correct answer and solution (increasing DRAM voltage) I am locking this discussion.
If you continue to experience WHEA errors on your system after checking the steps above and validating each one, please open a new discussion providing the information requested here: https://community.amd.com/t5/knowledge-base/information-required-when-posting-a-discussion/ta-p/4227...