@Electric_Squall well obviously some things can fix it. It's most definitely an issue between the memory controller and the RAM timings.
I think a lot of us are using ram that's rated at 3600 for Intel. This ram has a JDEC of 1000, and just doesn't allow the Ryzen 3 to run stable with many cores. I've had zero issues after dumping the G. Skill ram in favor of Kingston's HyperX that are "AMD Ready" and have a JDEC of 1200. The ram is rated at 3200, but runs just fine at 3600 with 4 sticks on a 5950x that used to crash every 2-3 days (or 3-4 hours with XMP enabled) when running at 3533 (wouldn't even run at it's rated 3600). G.Skill asked me to RMA it, even though it ran fine using XMP under a 3700x and passed ram tests.
If you're constantly RMA'ing your CPU, maybe it's not just the CPU's fault. Have you tried different ram aside from G. Skill? I'm thinking G. Skill might be really crap ram for AMD.
So far with the Asus TUF x570 optimization curve set to +5 it has not rebooted. In a benchmark, it makes the CPU run 0.38% slower which is not significant. But it has only been a few days. In the past it would last about 2-3 days before the WHEA 18 disaster. Another ridiculous thing that I am trying, is coding a python process that keeps all cores busy at 15% utilization when I am doing nothing. That is so duct tape, but is possibly working. It is hilarious that we need to do all kinds of magic to get these chips working.
I just had an invalid HTML error on my post here, the forum then auto corrected it removing the HTML issue. I then used the post button again with the corrected formatting. The site then threw a post flooding error at me falsely believing that the last post went through. I now have to wait ten minutes to post because it thinks I am flooding when I didn't even post anything today. I will return in ten minutes and post this, but maybe AMD could fix that code error on their website and the this chip issue.
Had same issue on new install on an asus b550F,10s of reboots a day with the only clue being the whea logger error in event viewer, giving us fk all to work with except second guessing the durability of other components... what worked for me is undervolting the cpu to 1.3v (mobos default was on 1.45 or something like that) and disabling global C-state,, every other option is untouched. haven't had a crash since .. im no OC nerd , didn't even want to go into bios but had no other option, i would except a 500 euro cpu would work out of the box.
So far, I have had 6 days without the error after increasing the optimization curve to +5. About 1.5 days longer than the typical restart. We'll see if this keeps working, but so far it is solving the WHEA 18 issue without any changes to RAM which points back to CPU quality.
Typically it would WHEA 1.5 to 3 days. I am now on day 8 without the WHEA. All I did was keep DCOP on with 3600mhz and change the optimization curve to +5. I ran passmark CPU tests 3 times before and 3 times after the change and subtractracted the average before and after performance. It costs only 0.38% to run at curve optimizer +5, in other words nothing. So if this keeps stable, this was the fix for my system. I don't care about that very nominal performance as the system is already in the 99.9th percentile.
Mine is fairly stable as well with the undervolting. In fact havent managed to crash it yet with any sort of abuse i throw at it... Looks like each mobo & ram combo + the silicon lottery needs to be fine tuned for stability.. oh well , this is wut we get for bleeding edge hardware..
Id still rather burn out 5 chips before even considering the other "company"
It made 1 month yesterday, since I've RMA my 5900X, which was giving me really hard times with constant random reboots WHEA error on idle or simply watching YouTube videos. Tried to fix with every single possible settings with PBO.
Since new CPU has been installed, I've been on CO -30 all cores (offset -150) and not a single WHEA error since 20AUG.
For the record, I am using 64gb 4 sticks of RAM T-force ARGB 3600Mhz CL14, which at that time with old CPU, I did even tried to run only 2 sticks and XMP OFF thinking it could be related with memory controller.
So far so good.
Update on my thread:
I ended up buying another motherboard / CPU. When I placed the new (5800) CPU in the older motherboard it worked perfectly. I then put the CPU with the errors into the new motherboard (now an ASUS vs the original MSI), and got the same errors.
I finally decided to go through the hassle of an RMA. I am in Canada, and I had to pay to ship it to Florida, (in the US).
It took a few weeks to get it back, but the new CPU worked without any problems. No issues and no crashing.
So eventually after 10 days, I had the WHEA 18 again with +5 on the curve optimizer. Just changed it to +6 and we'll see how long it is stable this time.
I had assumed everything was ok, after over a month of no issues.. then suddenly. WHEA error. I pushed back the settings. Another one a few hours later. Turned to stock A-XMP settings, and bam, another WHEA 24hrs later.
At this point, I have to conclude the CPU is crap. I'm contacting my supplier for an RMA, or failing that, will directly RMA with AMD. I feel like there's grounds here for another class action lawsuit. Kinda sad that every new CPU release by AMD is followed shortly by a class action lawsuit. My old FX cpu, with zero problems netted me $60 in the class action, just because someone didn't like AMD's definition of cores. But for all the crap it got, I still have that old FX machine humming along, warming my feet as a beefy little nas.
Feels like the 5950x was just hastily thrown together to give Intel the finger. Unfortunately, we're literally the victims here.