cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

steveoeditz
Adept II

AMD 5900X WHEA Error's

Problem is as follows system boots works great I can watch youtube, edit on resolve and use photoshop. As soon as I boot into a game I can only play for 15 min. 

 

My system:

CPU 5900X

GPU 6800XT

PSU Be Quiet Dark 1000W

Ram Patriot Viper 4133( currently set to 3533 and fabric clock 1733 no DOCP just set to auto)4x8gb

Motherboard Asus Tuf X570 Wifi Plus with newest BIOS installed 4021

I have it overclocked and everything in the bios set to auto, I have enclosed pictures as well.

I have done numerous changes such as up soc and vcore voltage, disabling cstate, changed the ram to 3200 and still crash after crash. I know I'm going to have to buy another 5900 but what can I do in the mean time just to be able to play a game is there anything at all that can be done? Or should I just swap my 5800 in and pray it doesn't do the same thing? I do have a spare MOBO as well but haven't read anything about anyone really having MOBO issues. I bought this CPU brand new from another party and yes while I have the receipt I believe they would have to start the RMA process and I'm sure that's something the other party wouldn't be interested in doing.  Screenshot 2021-08-27 010849.pngScreenshot 2021-08-27 010241.pngScreenshot 2021-08-27 010305.png 

0 Likes
92 Replies

I bet you’re in performance mode for power management which is why it is minimum 100% for cpu state. Switch over to balanced for now.

I think your 5900x is basically f’d. Swap in the 5800x and it will probably be fine. You’ll need to RMA that 5900x immediately

Or… you can go a little nuts. What I an going to say next will be done at your own peril, only attempt this if you feel confident and can be extra vareful. It’s a long shot but the other step is to overvolt the 5900x. I know that the cpu vcore can fluctuate around 1.1-1.4v. It could be the signal isn’t clean enough so it assumes errors when there isn’t any, so like in overclocking we can set the vcore slightly higher to improve stability. You could fix the vcore to 1.42v (assuming your cooler can handle the extra heat) so that it constantly has that, or apply an offset so it adds like 0.25v to whatever it dynamically switches the vcore to on auto, totally up to you. Just do not go too high and you should be fine, generally for Zen 3 the danger zone is 1.5v. It might help if you’re still encountering issues even after disabling CPB and PBO 

0 Likes

So another member on here helped me go through the rabbit hole and we just did a static overclock pbo and cbp off highest it got was 4.1 and then crashed into the ground. This asus mobo does strange things with voltage so tomorrow gonna try other motherboard and more than likely rma 

@steveoeditz I'm in a similar situation. I'm on my third 5800X, each behaved a little differently.
I have a similar setup ASUS B550, G.Skill 3600Mhz same timings as you.

I noticed at the beginning you were trying to set you memory with DOCP off but you still had it at 1800Mhz, you might want to clear your CMOS just to make sure your BIOS settings are set correctly. I know the ASUS BIOS can show one figure but in the background have another, especially if you use the easy setup.

With my 3rd CPU installed last week, I had a couple of crashes straight away. I was on the latest BIOS.
Everything else is stock with DOCP enabled.
Since setting CBO to disabled, which should of been disabled on "Auto" which is default but some say this isn't happening.
With Auto it's meant to be controlled by Ryzen Master, I don't have Ryzen Master so it should be Off.

I also updated my Chipset Drivers in Windows, they weren't updated since March.

Since both of those changes my issue hasn't reoccurred. It's only been 5 days 24/7, so far to early to say it's fixed, but worth trying a CMOS clear and making sure you're on the latest BIOS and Chipset drivers.

I still feel you're going to need an RMA, it seems as bad as my first chip, I dunno about my 3rd chip yet.
My second chip however was perfect but then up and died one day

I only got WHEA Errors on my first chip, they stopped on the 1st of April, I think I did a BIOS update then. It was still unstable though.

Anyway goodluck.

0 Likes

Whoa, that Chipset temp at nearly 60C on a system that's basically idling?  And NB Current at 35A?  That seems wrong to me.  My chipset temp is 44C and NB Current is around 8A at idle.  Mine is a 5800X on a B550, and I know the X570 does consume a bit more power, but those numbers are ridiculous.   Unless HWinfo is reading something incorrectly.

If those numbers are real, and you've disabled CPB, imagine what happens to those numbers when CPB is enabled and the CPU boosts up.....that's what I'm thinking.  But maybe I'm off base.

 

 

0 Likes

Ok gonna give this a go and I do have another set of ram a spare motherboard and a 5800x as well

0 Likes

Try the RAM first, if it keeps crashing still switch over to 5800x. I have a good feeling the 5800x will be okay, the RAM being the issue is 50/50

0 Likes

5800x lol

0 Likes

I have a set of skill trident z I think I’m just gonna pull the patriot out and try them

0 Likes

Hey,

I have a 5900x and B550 MSI tomahawk and also have random reboots while gaming.

Ive RMA'd my 5900x and got a new one still same issues. So i wanted to try your recommendations.
My mobo does not have the differential field so how do you suggest adding your solution to my settings?

When i go to Vcore and VSoc settings i can choose  + offset but it doesnt let me put  +0.006v

Step 1,2, 4 and 5 i already did.

cheers,

Yede

0 Likes

So you where getting whea errors that where correctable and critical errors such as a fatal hardware error? If you rma and it’s still doing the same thing I’d advise trying a different motherboard all together. The cpu should work out of the box not requiring certain voltage tuning to just play a game with the added boost clock of 4.7.

Hey,

Thanks for the reply.

They were not correctable and hard crashed my pc. So completely restarted it.
Happens during gaming always. Mostly a crash every 3-4 hrs. Have default settings and all. Processor-APIC-id's are all over the place so not 1 core.

Yes. my thought exactly but I am not in the position to just try another mobo atm.

Also, I have the stable latest version of BIOS and the latest chipset drivers from the AMD website.

So I am trying the recommendation from Gwillakers. To see if that fixes it for now.

it is a **bleep**ty situation, to be honest..

Cheers,

yede

Gonna be 100% honest if your getting non corrected errors and crashes it only a matter of time before this cpu won’t work either. Like we did all kinds of troubleshooting found out on a static overclock the highest my cpu would operate was at 4.1 and then crash. With my bios set to defaults the system boots to windows and shuts down. Before this it would do what yours is doing work for 4-5 hours and throw whea errors and then reboot. But if yours is a rma and it’s acting like this you need to rma again. 

0 Likes

@sciurus it's a really tricky situation for a end user to troubleshoot not having a wealth of spare parts to use.

It's also affecting lots of people, I don't know the percentage on the grand scheme of things it can still be quite low, but a few are getting replacement CPUs that also exhibit the same issue and another replacement works.

It could also be many other components in your build.

It's really kind of random and there's no official statements pointing in the right direction.

I do wonder if it's the combination of fast components that push it over the edge, including a gen 4 nvme. I wonder if max 3200 Mhz memory that is badged 'made for AMD' will help the situation. However there's no official warnings on ram except the 'supports up to 3200Mhz' which isn't on the box. There's also a lot of the marketing material showing 3600Mhz memory and saying the 4000Mhz is the new 3800Mhz.

I also heard a story, I think it was 5800X but may of been one of the other 5000 series chips. Two friends made identical rigs, one got the instability issue. Just to rule out if it was the CPU they both swapped CPUs and then both PCs were fixed. This seems like it's a very subtle instability we're trying to track down.

I did find I had zero issues with my cousin's verified stable 3600 on the same platform. Many have swapped out MOBOs without change, but I do think I recall one that helped, however I don't know if the issue returned for that poster but it's certainly possible. It could be anything unfortunately.

0 Likes

@sciurus   If you go to the offset field and can not enter anything, I believe it is because you did not turn Vcore from Auto to Normal  (Or maybe it is manual on your PC)

The same thing should be true for VSoc.    First turn from Auto to Normal, then adjust the offset field.  

A lot of times the offset field is a drop down box.   But use only the lowest positive increment like .006  or .012 if that didn't work.

 

0 Likes

Hey,

Thanks for your answer.

I have an MSI B550 Tomahawk. I cannot change it from auto to normal. I have the choice of offset, override or override + offset and AMD overclocking.

I tried putting it on offset and add +0.006 but it won't take it, it switches back to auto again when I press enter. I guess MSI is weird. 

Thanks,

Yede

0 Likes

you may have to clear the cmos for the change to take effect 

0 Likes

Hey,

Yes, it is hard. But in a way a fun challenge too.
it will feel good when I find the problem.

When my new CPU did it again, I knew it had to be my RAM or Mobo. so that is maybe something to figure out.
I think the chance would be really small to get 2 broken CPUs.

It is also so random. Today it crashed like 3 times within 45 min. And on Monday I could game for 5-6 hours without a crash.

Cheers,

Yede

 

0 Likes

@Cmdr-ZiN 

Do you know if the issues are happening with Windows 11?

0 Likes

@sciurus "Yes, it is hard. But in a way a fun challenge too.
it will feel good when I find the problem."

Sounds like something I said 6 months ago.

Many have gotten 2 CPUs with the issue. My second CPU was from Malaysia not China and it was perfect but it died after 2 months, so it behaved different to the others. Although in the last week it also started randomly restarting.

@crayraven I'd say the issue will be the same in Windows 11, although I haven't tried.

I have had it happen after finishing a memtest86 and just sitting idle in the menu it did it.

Also others have reported the same issue in Linux.

Your results might vary.

 

Does anyone know a reliable way to reproduce the issue?

The most reliable way I've found is to leave it idle for several days, although after 7 days I couldn't get the new CPU to crash after chipset driver update. I then ran a timespy test and it failed after finishing the first test.

After a reboot I couldn't reproduce the issue after several attempts.

@Cmdr-ZiN 

I'm starting to wonder if it could be related to nvme and pcie 4.0. 

Im interested cause I do have 2 m.2 drives and of course the 6800 xt on the pci that it should be but don’t know how that would explain the voltage shifts I see. It’s almost like my issues is my motherboard doesn’t know when to stop boosting or cpu and goes all out on auto and then crashes.

0 Likes

@steveoeditz 

I honestly don't think AMD has made a bunch of bad cpus that just die and have severe issues. It just doesn't make sense at all. I'm doing some tests that will take a few weeks to see the results. But I'm just starting to think it might be nvme. Can I ask are you m2 drives pcie 4.0?

0 Likes

6800 xt sapphire gpu amd, both m.2 gonna have to check in about to swap my 5900 to the other motherboard see what results I get. Your comment about bad CPU’s my 5800 has a another die on it disabled which isn’t a bad thing but goes back to the high yield rate for production but isn’t that strange? Seems like it was destined to be a 5950 didn’t meet specs and became a 5800??

Hey,

I also have 2 nvme's 4.0's. So could be that. But it would be weird tho.
It is also, as I read on multiple sites and posts, that it is so different what people are experiencing other than of course the same WHEA error. Also, a lot of fixes help some people and some don't. it seems so random to me. 

cheers

0 Likes

@steveoeditz 

 

I think thats just how cpus and gpus are made. 

0 Likes

@crayraven that was one of my first thoughts several months ago. I didn't remove my Gen4 NVMe but I did disable it in BIOS and had the same issues. It doesn't fully rule it out but stopped being my primary suspect.

I stopped having WHEA errors with my original CPU on April 1st, I dunno why probably a BIOS or software patch. My first CPU was improving with updates and crashing less often but still crashing. I lost my HDD not long after. Keep in mind HDDs don't like frequent sudden power offs, I think it was boot looping one night.

There's definitely CPU issues, I can garuantee that, the question is how much and is that the main problem. My first CPU was unstable, even running my RAM at 2133Mhz was unstable, if it was only unstable when using DOCP then that would be one thing.

My second CPU was perfect in the same system at first, this makes me think it's less likely the system at fault. The CPU degraded after 2 months, I don't know why. The first CPU didn't degrade it got better over several months, so that makes me think it wasn't the systems fault.

The third was a little unstable at first with DOCP at 3600Mhz, I heard the system trains itself but no idea if that's true. It's only crashed once since I updated my chipset drivers and that may of been a software or radeon driver thing although the event log timings don't quite line up for that. It was getting a bit glitchy after several days online, several pieces of software were getting errors.

Still is seems far more stable than the first CPU.

So even in the 3 CPUs I've had they have all behaved differently, you really shouldn't be able to tell. However there's also a history of the issue with the 3000 series, some were fixed with RMA and some eventually by a BIOS update and there's also many reports of it being picky with RAM. It may not be an RMA reason but a large number might not support 3600Mhz memory and good 3200Mhz memory isn't very common anymore, it's all higher speeds.

Also potentially for more than one issue, like a radeon issue, so just when you think you've solved it another bug messes up you test that should of been successful.

I've run many burn in tests, OCCT and several others, none can make the PC unstable. I wish there was a reliable way to reproduce the issue but it's just so random.

Anyone know a reliable test?

 

@Cmdr-ZiN 

 

That sounds absolutely like a nightmare. You are probably right about the cpu having issues. But I'm trying to understand why are there others who don't have the issue at all? I'm wondering if its a combination of different things that causes or contributes to the instability or is it just the cpu somehow. And what is the % of people experiencing it? I honestly don't want to rma because I feel I'll just get caught in a loop and eventually AMD won't rma anymore. Maybe best to wait for the b2 stepping and how it fixes issues? I don't know.

 

But I do know no one should be going through this after paying premium prices for these cpus. Its absolutely sad that AMD has yet to say a word about it.

Honestly I’m starting to think it maybe a motherboard issue. What motherboards do you all have and what ram are you all running??

0 Likes

Dropping some my information here in the hopes it helps in any way. On my 4th 5900x, still having WHEA-18 at idle. Ever since the end of February this year.

MSI MPG Gaming Plus - BIOS 7C56v181 (Latest available, with ComboAM4PIV2 1.2.0.3c)

2 M.2 Drives - WD Blue SN550 1TB NVMe Gen3 x4 PCIe

G.Skill Ripjaws V 32 GB (2 x 16 GB) DDR4-3600 (WHEA-18 no matter if it's stock or listed XMP/DOCP)

Have you tried it with only one ram stick in? 

0 Likes

Good idea. I meant to try that, but I've just been too tired.

On the offchance that does work, any idea what that would mean going forward? New make/model of motherboard? Replace the ram with a different kit?

0 Likes

So I was listening to a popular YouTube optimization video goes by the name of frame chaser and he was talking about memory and what memory to buy that is b-die and has no issues no matter the motherboard. He then mentioned a type of ram has issues with numerous boards and got me thinking even if it’s on the qvl list that says it’s compatible whose to say they even tested every single bit of ram on those boards prior to release? What if your ram is the problem?? I don’t know like you I’ve been sick with the cold and was supposed to work on it yesterday and today but probably not gonna get to it till tomorrow. 

0 Likes

It feels like a ghost is causing these problems, considering my system is stable under a graphical load. I'm willing to accept any and all lines of troubleshooting. Do you remember the specific name of the video? I'll give it a watch.

https://youtu.be/RoF9HhELiRI I so badly want to reach out to him and have him investigate himself because I’m sure he’s the type to be able to figure it maybe I’ll see what he says.

0 Likes

Have you guys tried ClockTuner  for Ryzen? Diagnostic says I have a silver sample cpu. I guess that isn't too bad.

 

https://www.guru3d.com/files-details/clocktuner-for-ryzen-download.html

@crayraven I'll check out clock tuner later.  (edit: how do you determine your CPU quality? is it just by seeing what it clocks too? Edit2: found a guide to the diagnostics :P)

I saw on a forum 25% of people couldn't run 3600Mhz memory, doesn't mean the CPU is a dud just can't get that high. It was a small sample size though.

I think a lot of us are used to Intel handling whatever speed memory but Ryzen only supports up to 3200Mhz. Yet so much officially marketing shows 3600Mhz memory. Good 3200Mhz memory isn't that available anymore the market seems to have more options at 3600Mhz.

I think in part there's CPU issues but also many CPUs can't handle higher speed memory. However I haven't tried my latest CPU with stock memory speeds because it's so hard to reproduce.

Still my first 2 CPUs the RAM speed made no difference. However my RAM was removed from my ASUS QVL for 5000 series CPUs only, either that or I misread it and it was never on it. It didn't used to be on the G.Skill website for my MOBO but it now is on the QVL for G.Skill.

I don't think for most it's a particular MOBO, I've seen several models online with the issues and many people have swapped MOBOs and the issue remained. My MOBO is an ASUS B550m Plus WiFi however.

I do think it's most likely for most people a combo of RAM and CPU lottery but for a few it's just some unlucky CPUs.

However there has always been rumours of high RMA rates but the main one was taken down without a statement as to why.

I keep feeling something is causing the instability in the CPUs but I can't figure out what.

It's not acceptable really. If I contact AMD they'll probably just say to RMA this or that, I doubt I'll get any answers and that's what we really need.

0 Likes

@crayraven so according to CTR, I have a Golden sample and my system is completely stable.

I do wonder if the chipset update or manually setting PBO to disabled, instead of leaving it on auto ( which auto should also be disabled ) made a difference. It only ever crashed twice before then and the one time after it the crash was a little different.

I also wonder if the system auto tunes or something and gets a little better after the first week after a hardware change or BIOS reset.

There might be more than one cause for crashes and I might have eliminated one or two.

I feel my first CPU was the worst for the crashes by a long shot.

Normally I wouldn't worry about this at all, if this was my first CPU and hadn't had that terrible experience with my actual first CPU and then my second CPU randomly dying. I'm now hyper aware of any instability. I suspect if most had a CPU like I have now then it wouldn't be reported or noticed.

0 Likes

@Cmdr-ZiN 

 

I spent some of the weekend reading about pbo and curve optimizer, so I can get a better overall understanding of how Zen 3 operates. I think why whea 18s occur and how to fix them is really simple. Its just that certain cores aren't getting enough juice. It doesn't mean that the cpu is broken per say...but they can't operate within stock settings. The solution is just to find the cores that are crashing and use curve optimizer to give them some extra power. 

Its up to each person if they want to do this or not. Ultimately though, its AMDs responsibility to provide each user a cpu that works within stock settings. I don't think a bios fix or anything like that could ever fix these issues, since its individualized for each cpu. The real issue is that there is a one shoe size fits all model for these cpus and not all cpus fit into that model. 

0 Likes