I should have clarified that apart from enabling DOCP to set the memory to 3600, all other settings have been left to their defaults. Before swapping the PSU the system could occasionally BSOD when at the Desktop though most commonly it occurred during gaming with the "Cache Hierarchy" error in the event log. This was when I was using the GTX 1080 card and I had to manually set +1 in the Curve Optimizer to stabilise the system.
After swapping to an RTX 3090, I started getting a blank screen when gaming and the system rebooted but would not proceed with booting Windows (there was a white LED constantly lit). The Curve Optimizer was still set at +1 for all cores. After swapping the PSU out, I no longer got the black screen/reboot and I decided to set all CPU options back to default so that single-core boost would peak properly.
Since the last time I've posted here, I've played a good 10+ hours of games including AC Valhalla and Control with all settings maxed out and the system has been rock-solid. I can only think that my EVGA PSU had developed a stability issue in circumstances with rapidly changing loads. It seems to work fine with my spare system that now has the old GTX 1080 and a Ryzen 3900X so perhaps the 5xxx series of processors are more sensitive to the PSU, at least in my case?
5600x, Have been battling with whea incorrectable error for 3months, after i RMA I will sell the 5600x an the b550 mb an go back to Intel.
Not a word from amd, quick to take our money
I'm not entirely certain my issue is the same as the problems found here, but I am getting WHEA 18 critical Kernel Power errors, so I'm going to share my experience in case it helps find a solution.
Ryzen 7 5800x
EK AIO 240
Asus Crosshair VIII Dark Hero
G. Skill Trident Z Neo 16GBx2 3600 CL16-16-16-36
Asus ROG Strix 3070
Samsung 970 Evo Plus 1TB
Seasonic Focus GX-850
This system was put together a couple of weeks ago and everything seemed solid. Cinebench, Superposition, Heaven and games like Doom Eternal and Gears 5 worked great. Skip to a few days ago where I tried to download another game from the Microsoft Store. In the early stages of the download the system just restarts. Event viewer shows the following:
A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0
I had played with some PBO and CO settings in attempt to lower temps. So I set those back to auto and tried to download again. Download lasted longer but the system ultimately restarted with the same WHEA error. Only other thing left to change in bios was the DOCP. I lowered it to 3200mhz and FCLK to 1600 (down from 3600 and 1800). I am now able to complete downloads for a couple of games and have not seen the error with these settings... yet.
The download thing is odd. However, when I attempt to download through the MS store the WHEA errors consistently appear and force a restart when DOCP is set to 3800 and FCLK to 1800. Everything else is set to their default/auto settings.
Is this an issue with my specific RAM, or does it relate to the CPU? Isn't FLCK tied to CPU voltage in some way?
Almost forgot to mention, my Asus bios version is 3401. I believe it has AGESA 184.108.40.206.
EDIT: Scratch that. I am no longer stable at 3200 and 1600. Just experienced another WHEA 18 while attempting another download.
Something I also found on my side is that when windows throw WHEA errors due PBO/CO even after changing values back to AUTO, the system would keeps crashing with WHEA errors which was not before.
The only solution to avoid it to crash after, setting it back it to Auto, is to clear CMOS or reinstall the bios firmware to be sure. Not sure why this but, it is like if the PBO/CO setting are getting stuck on older values after crash. (Simple test is, once the settings are set back to Auto and if you go to change it again to Advanced, all the older values that were set and which raised the WHEA errors will appear back on the advance settings. After doing a full reset of bios, this doesn't happen and shows the all fields like if it was the first time activation and you need to populate them.)
I've tested this several times on my gigabyte X570 xtreme and all time, I got WHEA errors and set it back to AUTO and reinstall bios firmware it fixed it. You may try it on your side.
Meantime for you WHEA error, you are maybe to high with the CO value on Core 0.
I appreciate the suggestion, but my bios has been properly resetting the values. I have set to auto, manually set them to 0, hit F5, and I even just flashed the latest Asus bios. The issue persists. In fact, the latest bios update seems to have made it worse.
Ok, I guess I've got to get my two cents in:
We can get the high temps under control. That is the easy part.
This is what I tend to advise most people, The parameters pertain to the 105W TDP processors 5900x and 5950x
I would set VSoc from Auto to Normal
Then add a .006v positive differential to VSoc
Secondarily, I would set VCore from Auto to Normal
Then add a .006v positive differential to VCore
The above boosts Voltage by the smallest amounts(which your processor needs)
The amount is less than what the CPU would boost when it changes frequencies.
A very slight boost of voltage for VCore and VSoc I reccommend for most crashes. (A little goes a long way)
The above should keep you stable at the lower frequencies but also boosted voltage at the higher frequencies
Try this, this should help, and if it doesn't totally cure crashes at idle then, use .012 for VSoc and Vcore differential. (still a VERY minor increment)
To get those higher frequencies under control I suggest:
Turn Core Performance boost Enabled
Turn PBO from Auto to Advanced
Set PBO Limits to Manual
Turn PPT from 142w down to something like 120w
When the processor sees that it is approaching the new lower power limit, it would be discouraged from boosting to
an even higher frequency. Keep TDC at 95A, and Keep EDC at 140A.
Set "Platform Thermal Throttle Limit to Manual. Then, the next field that pops up set "Platform Thermal Throttle limit
down from 90 to 75 or whatever temperature in Celsius that you feel comfortable with
Yes Core performance boost can make the system run very hot. Not so much PBO. If you turn them off you lose lots of performance. But Enabling them and letting them run, with a more limited Power (you don't really want to hammer that CPU do you?) and with a tighter limit on temperature. You get much of the same boost as before, even sometimes greater, but without the negative stuff. Try it. I think you will like it.
Let me know if anyone wants to try all that and see if it works. I've never in my life spent as much time in a bios screen as I have in the past week. I expect this cpu and it's components to run stable at stock settings. If that's too much to ask then I am going to take advantage of my return window and switch back to Intel. I'll take the hit to performance and efficiency if it means I don't have to deal with this error anymore.
I just found this discussion: https://community.amd.com/t5/processors/ryzen-5000-crashes-whea-errors-will-get-a-quot-silent-fix-qu...
If that's the case then I am definitely just going to return and get a refund. This is ridiculous.
@authorized_to_ill that information is nothing new, it's speculation at best. The only statement I've heard is B2 is to improve manufacturing effeciency.
If this was a widespread issue it would be all over the news.
I have to imagine that while many are affected it's probably not enough to land on anyones radar.
@eliotcole If you want my PC specs you can look a all my posts and scroll to the beginning, there's a good list on the first black screen resizable bars one but I usually post them when I first join a thread so there's more details later when I started noticing the WHEA errors.
If you have any specific questions about my hardware after that let me know.
The only way I've been able to reproduce it is to leave it on and wait, It used to happen within several days, but every now and then it'll be fixed for a couple of months.
I've looked in PSU, I've replaced the CPU and ruled out GFX card.
I'm now looking into the RAM situation, AMD 5000 series CPUs only support upto 3200Mhz memory. I'm used to Intel and previous AMD systems just supporting any memory on the QVL for the MOBO, but maybe it's not that simple with Infinity Fabric. It fails like it's a CPU not a RAM issue, I wonder if the CPU acts up with faster RAM.
Many of us have 3600Mhz memory and there's lots of reports on RAM issues with Ryzen. So I'm used to just checking the QVL and enabling XMP and the system being stable but maybe it's more complicated than that.
So I'm going to learn all about what makes RAM different on a Ryzen system and get back to you if I find any interesting results.
@Cmdr-ZiN The thread is fairly new, so I assumed it wasn't common knowledge. To be clear, I wasn't trying to help spread conspiracies or anything. I just wanted to spotlight the PBO and CO as the potential culprits to these issues. I have disabled both and I am now running my "Microsoft Store download test" to see what happens. As I mentioned before, downloading through there has consistently created the WHEA error/restarts. I would love to know if anyone else can recreate the error the same way. Sometimes I crash after 3-5 gigs are downloaded. Other times I can download 100 gigs before a crash. Either way, it's the only time I get the error.
As for the RAM theory. I considered this as well. That's why I brought my RAM speed down to 3200 and FCLK to 1600. I thought that was good enough for a while, but then the same WHEA errors started happening. I haven't tried going below 3000, but at that point I am 100% on the road to a return. Heck, I'm pretty much already out the door. I'm curious though. So I'm giving it one more night of testing.
EDIT: Alright, well that didn't take long. PBO and CO disabled.... still crashed. Same WHEA 18 error. I am completely out of ideas. Hope you guys can figure it out.... eventually.