Alright, time for an update on my original post from earlier in the week.
I had originally requested a warranty replacement (RMA) from AMD and described the problem rather thoroughly, but the request was redirected to their technical support department instead. The CSR provided a list of eight suggestions to try, and to be fair to them, I have gone through them all and recently wrote back with my results. There were honestly no suggestions in the list that I hadn't tried before, other than explicitly setting the "Power Supply Idle Control" option to "Typical". Unsurprisingly, the suggestions haven't made any difference - WHEA_UNCORRECTABLE_ERROR randomly occurs anywhere between 30 seconds and two hours after booting. Stress-testing the computer seems to reduce the chance of it happening - letting the computer idle for a while and then launching some programs seems to trigger it (and even then, not reliably).
Since that time, I've tried buying another motherboard and moving my components onto that motherboard instead. I purchased an Asus ROG Strix B550 Gaming-F Wifi motherboard - I explicitly chose something that was a different manufacturer and different chipset than my regular MSI MAG X570 Tomahawk motherboard. After all was said and done: the WHEA errors returned. Everything seemed fine at first, but they always come back. I'll also mention that I tried this setup with a completely fresh install of Windows 10, as well as in Linux too - the error in Linux is a Machine Check Exception (if I get any error at all and not just "immediately reboot and black screen"), which seems to be a reasonably close analogue of the Windows WHEA error.
At this point, I don't know how I can conclude that the source of the problem is anything other than the 5900X itself. if I've tried two different PSUs, two different motherboards, two different RAM kits, two different GPUs, two different hard drives, and two different operating systems, but only one CPU causes the problem (keep in mind that my 3700X works fine in all of these configurations), then how can it be anything but something wrong with the 5900X itself? (For the curious, my production batch number is BG 2051PGS.)
I realize that there might be ways to force the 5900X into working by tweaking RAM and/or SOC voltage, disabling XMP, disabling PBO, disabling CPB, but honestly: at the end of the day, should we even need to do that? If a CPU doesn't work correctly with all of the default settings, then something is wrong, no?
In any event, I responded to AMD's technical support with my findings. I'm hoping they'll agree that an RMA is a reasonable way to go forward.
not over yet.
This is my first amd build. I had always Intel until now.
Actually I build 3 years ago for father amd build, but low budget PC.
I got my CPU Ryzen 5900X about 1 month ago. At start I had default settings in bios, only XPM was enabled. I start noticing in first week that my PC is restarting after it was in save Power Mode. Event ID 18.
Processor Apic ID :0 (this one always shows)
Processor Apic ID :11 or 18 or 24...this is diferrent
Crashes happen about once a day or maybe once on 2 days.
A week ago I set in Bios Undervolting with PBO disabled. Since then I had only 2 restarts, its better but not gone. Restarts happen when PC is idle longer time (2-5 hours maybe). Never happened when CPU was under load. Really strange...
I decided to wait a few weeks if AMD will find solution for this, since I believe this mightbe software related (bios or chip drivers) and not hardware...but this can confirm only AMD!
If not I ll probably start RMA , just to get new cpu. And if I get new cpu , i will just sell it and go back to Intel.
Its really annoying to pay so much money and then we have so many problems , when u expect to have TOP high end rig working flawlessly.
"i will just sell it and go back to Intel"
Unfortunately, there is no Intel's alternative to 5900x\5950x.
11900k is gonna be 8 cores only...
Hi, My first 2 weeks with a 5900X were very stable, but then suddenly started getting WHEA Cache Hierarchy Error, Bus/Interconnect Error or A fatal hardware error has occurred errors as well.
5900X BG 2042PGS
AGESA V2 PI 126.96.36.199 on Asus B550 (seems slightly better than AGESA V2 PI 188.8.131.52
Wanted to share for solidarity.
"My first 2 weeks with a 5900X were very stable"
It's pretty strange that issues appear after 2 weeks. Maybe smth was changed in your build?
If you have the latest GPU from AMD make sure that you are not running HWInfo at the same time
see https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/post-28744361 for details
Can you also share - do you have any stable way to reproduce these issues?
Same story with me.
I'm not a computer novice with 20+ years in IT hardware service (inc board level), nothing had changed from first setup in my 5900x system, was running perfectly for 2 weeks, all stock beside memory, Hard Reset with WHEA 18 error while using Chrome.
You can stress test the CPU all day and it won't happen, it only on idle or low loads where you will get hard resets/WHEA errors.
This indicates there's a voltage drop when the CPU isn't under normal loads.
Here's the catch increasing voltages slightly does nothing beside cause way more heat on heavy loads, you will still get the WHEA.
But why doesn't it happen straight away or a few days in once the machine is setup, why 2 - 3 weeks later when nothing has change or updated in the system, to me that comes across like degradation within the CPU, more so needing consistent voltage even under lower load situations.
Reproduction of the issue is sit and wait on low loads, ie: Chrome (doing anything), Email open etc, then walk away.
I for one don't trust AMD's RMA replacement (if I could of got one) because the same problem will arise, and has for many others with their RMA replacements.
You can't put the blame on the users all the time, with what now seems to be 100's of people having the EXACT same issue with different combinations of hardware, the determining factor is these Ryzen 5 CPU's, namely the higher end CPU's.
I've been running a 10900k Intel rig since taking back my 5900x/Dark Hero, I have hammered the cpu, good overclock, completely stable, left on overnight for days, less 2 cores yes, but I'll take that over instability.
@schoolofmonkey I was never a fan of AMD or Intel. I always selected the CPUs based on specs. CPU history from my PCs:
-Intel Pentium 133
-Intel Pentium 4
-AMD X4 965
I never had any issues with CPUs before. And now I cannot build a pc for 3 months already. I understand that Zen 3 is a new architecture, I understand that it is complex. But I do not understand why AMD is silent. I mean of cause it's bad for stocks if a company make an official statement that there are some issues with a product. But in a long time perspective - the company will keep its customers because of trust.
I'm still looking to build a PC based on 5950x, but at the same time, I'm feeling that I'm becoming an Intel fan...
i might change to intel because of this. agesa 184.108.40.206 seems fine for a couple of weeks but when i played division 2 the reboot and bsod came back again. very frustrated with amd right now. my rma will take too long to wait for the new chip.
Have the same issue on 5600x
Disabled CPB and manually set CPU clock ratio to for example 44.00 and V-core voltage to 1.250 and it works fine no crashes.
You can also try vcore 47.00 with volt 1.33
Did a Prime 95 stress test.
But now CPU will idle on fixed frequency,should I go for RMA?