cancel
Showing results for 
Search instead for 
Did you mean: 

PC Processors

steveoeditz
Adept II

AMD 5900X WHEA Error's

Problem is as follows system boots works great I can watch youtube, edit on resolve and use photoshop. As soon as I boot into a game I can only play for 15 min. 

 

My system:

CPU 5900X

GPU 6800XT

PSU Be Quiet Dark 1000W

Ram Patriot Viper 4133( currently set to 3533 and fabric clock 1733 no DOCP just set to auto)4x8gb

Motherboard Asus Tuf X570 Wifi Plus with newest BIOS installed 4021

I have it overclocked and everything in the bios set to auto, I have enclosed pictures as well.

I have done numerous changes such as up soc and vcore voltage, disabling cstate, changed the ram to 3200 and still crash after crash. I know I'm going to have to buy another 5900 but what can I do in the mean time just to be able to play a game is there anything at all that can be done? Or should I just swap my 5800 in and pray it doesn't do the same thing? I do have a spare MOBO as well but haven't read anything about anyone really having MOBO issues. I bought this CPU brand new from another party and yes while I have the receipt I believe they would have to start the RMA process and I'm sure that's something the other party wouldn't be interested in doing.  Screenshot 2021-08-27 010849.pngScreenshot 2021-08-27 010241.pngScreenshot 2021-08-27 010305.png 

0 Likes
92 Replies
EFermi
Miniboss

RAM and Fabric (FCLK) should be the same value, e.g. if you use 3600MHz RAM, fclk should be 1800MHz. Also, try setting to 3200 (mem) / 1600fclk.

0 Likes

yea tried that as well

0 Likes
linkz0rz
Adept I

I had similar issue. First check if the issue goes away when you disable core precision boost (CPB) in the bios. It should be in one of the CPU menus, may take you a while to find it depending on your motherboard. This should cap the CPU top speed to around 3.5-3.6GHz top speed. Leave DOCP on so RAM is running at top speed and boot into windows and run a game and see if it is stable. If it is stable, then you should think about the BIOS. Did you recently update to version 4021? Was it okay before? you could try reflashing the BIOS again in case the last one missed any important instructions, or try flashing it back down to Version 4005 or Version 4002 as those might be more stable.

0 Likes

I actually flashed back to the previous to see if the issue would go away and it would not. Not really excited about turning off CPB but if that's what will keep it stable until I get a replacement than so be it.

0 Likes

If disabling CPB works and is stable you could try this next. Re-enable CPB and install ASUS AI Suite 3 as that comes with easy CPU overclocking in Windows and you're on an Asus mobo anyway. In AI Suite 3, go to the CPU and overclock it to 3.7GHz by dragging the bar to 3.7GHz and leave anything else, apply it and test with benchmarks, games, etc. If it is stable, move it up in increments of 100MHz until it's no longer stable and that's your max overclock without tweaks which could fry the CPU. I got mine to 4.2GHz so it gave me quite the speed boost until I manage to fix my situation, it might give you the need boost for the time you wait for RMA or fix the issue.

0 Likes

so disable pbo and dont't use ryzen master i'm assuming with this method

That’s correct

0 Likes

so lil update just loaded optimized defaults basically putting everything at auto and i couldn't even stay in windows for a min or two I have now disabled CBP PBO and C-state memory is at auto, my mother board only see's 2 profiles for my memory 4133 or 4000 so I had to select auto and set it to 3600 and fabric at 1800. After doing all this with Ryzen master open my edc is no longer at 0 its actually showing a number now don't know what that's all about. I did undervolt it 0.0500 I'll probably adjust that to 0.0400 so I can get the 1.1 volt.  So in order to use the asus ai I would need to turn back on CBP. Correct?Screenshot 2021-08-27 211548.png

0 Likes

Most people don't understand just how out of spec they are running their systems.

First,  3200MHz is the Max in-spec speed for any memory in any motherboard supporting the Vermeer CPU's.

Second, the more memory ranks one runs, the tighter that spec becomes.  If you shove 4 sticks of Single rank memory, the support drops to 2933MHz.   Go to 4 sticks of Dual rank (Which I imagine you have), and the support drops to 2667MHz.

These are AMD specs. The Motherboard and Memory pushers can't do anything about them.  All one normally sees when a Motherboard is advertised is the super(OC) speed and the best in-spec speed of 3200MHz.  But they don't tell you that spec drops off as soon as a third stick is stuck in.  Some reliable manufacturers (Like ASRock and SuperMicro) will supply a table in their Motherboard manuals.

I loath rebranding by Memory stick pushers.   Let's see what you fell for.  Bring up CPU-Z and open up the memory tab. What is the highest speed rating that you see listed for those components?   Please post, I love a laugh.  It's the constant typing that bores the Heck out of me.   I'm not trying to laugh at your expense, but the post would compensate me for the typing. 

Hint: ECC memory never hurt anyone.

========   Recommendations =====

0.  If you want, save your current profile (but I believe you will eventually be best off with the settings I'm about to give you.

1. Load Optimized Defaults.

2. Set Memory speed to 3200MHz,      Don't touch XMP, or DOCP.   And for Heavens sake get rid of ASUS Suite, it hasn't been reliable for a decade.   Leave all timings to Auto

Keep CBP and PBO enabled.     Disabling CBP would unnecessarily gimp your system.  I believe we can aim higher.

3. (Give a touch more voltage to VCore and VSoc.  The system will run with a range of voltage.  Some people are having trouble with WHEA errors when their system goes idle and it down volts the CPU.   So... let it run Normal but with just a wee bit more.)

Set VCore from Auto to Normal   

Now a differential field opens up.     Set the differential to add +.006V to Vcore.

Set VSoc from Auto to Normal 

Now a differential field opens up.      Set the differential to add +.006V to VSoc.     

4.  Others, like yourself have trouble when the system  boosts to the very highest speeds.  

What we did in step 3 would boost voltage at the lowest and highest frequencies.  In your case we want to discourage the system from chasing the very highest frequencies but still boost well beyond base.  And this is done by adjusting power(PPT)

Set PBO to advanced.

Set limits to Manual

Leave TDC at 95 (Amps)

Leave EDC at 140 (Amps)

Set PPT down from 143 to 120  (Watts)     (you really didn't want to push 143 watts into those tiny transistors did you?)

                                                                             (this is what will stop you from boosting to exorbitant speeds then crashing)

5. While you are in the vicinity.   Lets get a handle on temperature.  People seem to worry about that.  I still hold to the old engineering construct that for every 10C one raises the temperature, the chip life is halved.

Set Thermal Throttle Limit to Manual

This should open up a new field also called "Thermal Throttle Limit"   Set it to the temperature you like.  I like 75 (Celsius)

The Ryzen is great at running within the limits you specify.  I could even take the fan off my heatsink and the system still won't go beyond the limits.   (It will just lower frequencies,  Multithreading, core dispatch, and voltages)

I think you will be pleased with these settings.  Of course lowering PPT and Temp will reduce some performance. However it is a LOT better than turning off CBP and just running base.

Run with these for a few weeks, before experimenting with pushing the limits on that memory.

Oh yes here is that table from ASRock.   (Remember these are AMD's specs for Vermeer)

Vermeer Memory.JPG

I'll give all of this a try I never seem to have temp issues when playing may get to 55-60 C but idle 34 usually. So if I do this and EDC returns to 0 again in Ryzen master is that just a glitch or something else? Since I have basically disabled my system to run at stock I'm still getting Whea errors but the system isn't shutting off. Screenshot 2021-08-28 001217.png

0 Likes

I just noticed, that the Event Logger was  warning you of  Corrected errors. 

It wasn't the corrected errors that brought you down, but the uncorrected errors (2 bits or more flipped in error)

The only system that I know of that corrects itself is the memory subsystem.   Thus ECC.

However, I doubt that your Patriot memory is Error correcting.   Which means that the errors are occurring within the internal memory controller on the Ryzen chip itself.    Again the most important values for this would be VSoc voltage and PPT to hold back frequencies that your Internal memory controller can not handle.

Thank you for the post.  A picture IS worth a thousand words.

Lets See...  Patriot took components that are rated for 2133MHz and 1.2 V

Cherry picked...LOL   and told the user to push 1.4 Volts, and then attempts to jump

Seven bins!

 4000(O.C.) / 3600(O.C.) / 3333(O.C.) / 3200 / 2933 / 2667 / 2400 / 2133 

I love Kingston ECC.

The Components are rated at the same speed as the sticks.

I usually can bump the speed upwards of two bins without changing voltage.    So my 3000MHz get 3400MHz and my 3200MHz get 3600MHz.   But the best part is that Memory errors from the Dimms themselves will show up in Event viewer.  People who try to overclock without Error correction, are like people who drive race cars without brakes to save 7 pound of material.

system crashes almost immediately with the loaded configs you have suggested 

0 Likes

If you are in PBO Advanced, and you are entering the Limits Manually.

I wouldn't leave the field blank.  Specify exactly what you want.

I believe default for 5900x is TDC 95A, EDC 140A, PPT 142 Watts

But set your PPT down to 120 

0 Likes

did that exactly and crashes

0 Likes

Is it crashing in BIOS?

What are your measured readings in BIOS?

0 Likes

Did exactly what you said and crashes if you want email me this board won’t let me reply I have to wait and it’s annoying email me at justin.d.stevenson1@gmail.com 

0 Likes

I see you’ve been fiddling with the RAM settings as per Gwillakers recommendations. I went back and I didn’t read how it went when you went with the original approach of leaving everything at default and disabling CPB, did it boot into Windows and stay stable? I read you also undervolted the CPU, that inserting unnecessary variables to the mix. If that undervolt is still in place that might be causing your memory controller issues. Take it one step at a time, apply defaults, dont undervolt leave is auto, disable CPB, stability test, if all good slowly reapply RAM and overclock etc. You’re rushing into it, slow down

Honestly I was still getting whea errors with everything disabled and performance was horrible I’ve contacted amd I think it’s rma time 

0 Likes

Honestly, I thought you wouldn't have trouble with the settings I gave you.

All would have kept you constrained within more conservative boundaries.

Since my settings failed you, I can see why you would have some doubt, but don't despair.

Be patient, and I believe we can tame this beast yet.

I agree with linkz0rz.     His approach is sound and warranted at this point.

You must get back to base, turn off all overclocking and move back into the forest slowly.

1. Load Optimized defaults,   CPB disabled, PBO disabled, XMP off, DOCP off.   

2. Reduce the number of memory sticks to two.    Make sure they are in different channels as specified by your Motherboard.

    (You could go all the way back to one, but I believe your problem is more with voltage and frequency to the Internal memory controller, not the Dimms).

3.   When you go to this simplified - Non Overclocked settings,  observe if your WHEA warnings go away.

      The ones I talk about are the ones that say corrected.    They are not bringing your system down, but they are the first indication that timings are not quite right.

 

0 Likes

I noticed your WHEA warnings before, but I just now saw that you were getting thousands per hour.

This happened to me a couple of years ago, (I don't think it applies here, but the more one knows the better)

I was changing a Motherboard in a case, (swapping an ATX in,  and taking an mATX out)

Forgot to change one of the standoff's.  It was left in the wrong position.   It squished some pins on the underside of the board, so that a couple were touching one another.   This caused my system to run super slow, as the Memory was correcting the errors, but not bringing down the system.   Straightening the pins removed the error and all was well.

Like I mentioned before, your system is not using ECC Dimms, so you won't catch the errors on the MB, but your processor still uses ECC internally.

Not warranted yet,  but you eventually might consider reseating the processor.   Pay careful notice if any underside pins are bent.

0 Likes

Sounds pike faulty RAM, do you have a different set to try? If Gwillakers is right then that RAM could be the issue. If you got a spare pair you could try those out see if it makes any difference

0 Likes

Real quick I have a msi x570 edge in my other rig would that be a better option over the asus mobo? 

0 Likes

If we are taking bets:

Bad Ram - 5%      (Though I don't like his particular Ram, the Event log shows Corrected Hardware errors. Which to me indicates that the data went wrong in the Internal memory controller, L1, L2 or L3 cache, and then corrected it.

Frequencies and Voltage settings for Internal Memory controller : 60%

Bad 5900X (including bent pins) : 18%

Bad Motherboard 17%

Keeping my fingers crossed.   But I don't expect it to account for those errors.  Non-ECC Dimms do not store any error correction or Checking data.   The Processor blindly accepts what the Dimms give it.   Once it is in the processor though, a 64byte cache line will under go a calculation of 8 bytes of Check info before the cache line is stored. When the cache line is retrieved, the 64 bytes undergo the same calculation, and the 8 byte result had better match the 8 byte check info when stored.  Otherwise, you have an ECC error.  If it is only 1 bit flipped it corrects it and logs it in one of the Event viewer logs.  If two or more bits flipped, the system is coming down

Oh by the way,  You need to wait 600 seconds to post again.  However I believe you can edit an old post immediately.

0 Likes

Well I can switch out ram from other rig and give it a go, other motherboard runs flawlessly with the 5800x the asus from the get go was a headache I had to undervolt it just to get it to boot. As for bent pins I’ll check that as well. Just had high hopes for the asus board since it was rated a lil higher than the msi 

0 Likes

Asus mobos are generally really good, it would very likely not be the mobo but you never know. Try out the MSI just to rule out the motherboard

0 Likes

I’d be surprised if it was I’ll keep you all posted I appreciate all of your help

0 Likes

Update no bent pins reseated cpu and swapped ram now have 2x8 gskill trident zero will boot up and let you know where I’m at after that

0 Likes

You don't have to wait for a crash.   One does not have to pay attention to all Event Viewer warnings.

See if you are still getting the warnings of corrected errors.

While not all Event viewer warnings, merit one's attention,   Machine check corrected errors do.

One should not live with those warnings.    They must be corrected or RMA.

0 Likes

ok mem swapped and docp profile actually applied and seems to be stable, still have a boat load of whea error's so where shall we start 

0 Likes

WHEA errors which are machine check - corrected are NOT good.   They can Not be ignored.

They must be eliminated, it is just a matter of time before you crash again.

If you are running DOCP turn it off.   It is an overclock.   First we must get this running inside specifications, without any machine checks.   CPB should be disabled as well as PBO.    We are not trying for performance, just stability without those machine-check errors.

By the way, you don't have to wait 10 minutes to edit an old post.   Easier to have a running conversation

Take a look at your event viewer.   You should see a separation of entries by time.   Take a pic of the entries just before the new set of entries, (where it complains you didn't shut down normally)

How long was it before you crashed?   Were you in Windows or BIOS.   Was the system stressed or somewhat idle?

0 Likes

Screenshot 2021-08-28 203912.pngScreenshot 2021-08-28 203942.pngScreenshot 2021-08-28 204019.pngScreenshot 2021-08-28 204044.pngso it crashed and i now have everything disabled and im at stock 3.7ghz and I have no WHEA errors at stock 

0 Likes

How long did you run before the Crash?    Were You in Windows?     Do you ever crash in BIOS? (they use different voltages, frequencies, and timings when in BIOS)

When you crash, is the system somewhat idle or more busy?

Just a nit,  I don't think it will matter, but your memory clock says 1800MHz.

We are having memory related problems, so this must be set down to at least 1600MHz so we don't run the Dimms faster than 3200MHz.

Your voltages, frequencies and temps look good.  All seem to be in range for a non-overclocked system.

0 Likes

20 seconds after im in windows it crashes wont crash in bios at all been almost 37 min no whea errors at stock about to swap the 5800x in 

0 Likes

The positive news is that you are getting far fewer Machine-corrected errors.

Which means, slowing down the system helped.

You may think, but I still came down.   However a memory error is an error, whether it is corrected or not.

Any error could, if uncorrected, bring a system down.   Now there are fewer opportunities.

It all comes down to waiting.     (Hope the simplicity of this doesn't offend you)

Three types of Waiting...

     1.   CPU can poll...    Are you done?    Are you done?....   etc.     Easy to program, but wastes CPU resources.

     2.   Wait and Interrupts.    CPU gives work,  goes off does other things when interrupted, saves state so may multitask

     3.   By Contract or Mutual Agreement.      Both the CPU an Memory are very busy pieces of machinery.   They for the most part can not be bothered by the other.    The CPU has work for the Memory subsystem.   The CPU is faster than the memory, and the CPU waits for the memory.  

Imagine two people (CPU and Memory).   They reside in two different rooms, they don't see one another.   The rooms share a common hallway.  There is a light in the hallway and a guy to flick it on and off.  The CPU when he wants to put something in storage, he leaves it in the hallway.   Memory needs a certain amount of time to put things away.  CPU and Memory come to the agreement that the guy at the end of the hall will flick the light on 12 times a minute.   They also agree that the CPU will give Memory 6 flicks of the light to store something, and after that the CPU can assume it was done

Now if the memory is an old fat guy like me, I'm gonna need every flick of that light to do my stuff.   But if Usain Bolt is doing the storing, he might not need 6 whole flicks of that light.   You would not be taking full advantage of Usain's speed.  Heck if you come to an agreement to store things in 5 flicks, the CPU wouldn't have to wait as long and would get more accomplished running at the same frequency.    Heck you might even tell the guy at the end of the hall to flick that switch faster.   The only problem is, is when you don't give the memory subsystem enough time.   Then you got garbage and crashes.

The AMD spec is 3200MHz.     That's why I'm such a nudge when people have memory problems.  They want to believe the package the memory came in, but ignore the Specs of the processor.

0 Likes

but where do we go from here i havent received any errors at stock and if this cpu out of the box cant even run at its boosted clock speeds and I have to handicap everything to just be stable then I wasted money lol I have the 5600, 5800, and now this and never had issues with those processors just the 5900x I'm down for some tweaking but this message board going back and fourth takes forever I can be reached on twitter @steveoeditz and we can dm each other before I swap the cpu out for the 5800 and just rma this 5900

0 Likes

Don't assume that all these things which are disabled will stay disabled.

Once we get a handle on what exactly is bringing you down, might be an easy painless correction.

Being stable in BIOS is a good thing.   It proves that there are settings that work.

In my original post, I was trying to limit running at the highest frequencies by lowering PPT.

I'm curious,   what happens if you edit your windows power plan and change the maximum processor power state to say 80%

See if that keeps you in windows longer.

0 Likes

Screenshot 2021-08-28 222407.pngso I have BITSUM power plan that was installed when I got my PC optimized however the whea errors came long before the optimization so are you saying I need to change it to balanced?Screenshot 2021-08-28 222045.png

0 Likes

Well doesn't minimum processor state of 100% strike you as odd?

I would recommend a minimum of 15%

And for now set the Maximum to 85%   (we will turn that back later)

0 Likes