cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

introibo
Adept I

5950x WHEA Errors (New PC Build)

Hi Everyone!

I just built a new PC and I'm getting random WHEA UNCORRECTABLE ERROR(s). Sometimes it happens immediately on startup. Sometimes it will happen after an hour of gaming. This error has happened to me at least 20 times in 2 days. The performance is great, but my system is not stable enough for me to use this as a full time PC yet.

I've seen others having issues with their 5950x processors and this issue.

Things I've tried to fix it:

1. Updated the BIOS

2. Toggled PBO off in BIOS 

3. I can't enable DOCP profile, the system won't POST at 4000 MHz settings

4. Ran Memtest, Hard Drive tests, and the GPU is great. Everything passes.

5. Used ASUS AI Suite 3 to Optimize the CPU usage for PC stability. No luck there either.

6. Most recent fix attempt - manually adjust the EDC current limit to 200A. This apparently worked for someone on reddit.

I suspect it's the processor. Is there a solution to this?

Specs

CPU - 5950x with ASUS Dark Hero Viii Motherboard

RAM - 32 GB (8x4) GSkill Trident Z 4000MHz

GPU - ASUS Strix 3090

HDD - Seagate 16TB Exos

SSD - Samsung QVO 4TB

M.2 2TB WD Black SN850

PSU - 850W EVGA G3

 

Thank you for your time!

 

0 Likes
1 Solution
introibo
Adept I

[SOLVED] FIX - CPU was defective. Returned it and got a new one. New CPU works perfectly, no errors.

With the old CPU I could only get it stable if I disabled Core Performance Boost and Precision Boost Overdrive. Effectively handicapping my 5950x to 3.4 GHz. Performance was greatly reduced. While it was handicapped I could still get my RAM clocked to 3600 MHz and it was stable.

I've seen most people with permanent fixes for this issue either RMA the CPU or return it and get a new one. The new one works!

Hopefully this saves you headaches and hours of trouble!

View solution in original post

22 Replies
crayraven
Elite

I'm having the same issue as well as many others. I do not think these processors are faulty, I think its something else. There are many band aid fixes but none of them really work. Even RMA doesn't work according to some people. AMD is very quiet about the issue, but I believe its just too big to ignore. 

Everyone has a different set up with different parts, but everyone is being plagued by this issue. The only thing in common is a AMD cpu and motherboard. 

0 Likes

Odd that there's so many affected but not real fix for it yet. It seems that some have stable setups too, but I'm not sure what they're doing to achieve that stability.

0 Likes

They are probably not stable. Some people go long periods without the whea 18 error, only for it to come back months later. I honestly think the issue will be hard to nail.  There seems to be two common things: It happens when the pc is idle or while its gaming. Usually a game that isn't very cpu heavy. 

0 Likes
neotax
Adept I

whea error are mostly too high memory/fclk clock, Amd supports not for nothing officially only 3200mhz ram.

reduce your ram clock to 3600 that works at least with almost every cpu, since you also have 4 ram sticks I would turn on GDM mode for stability and with ryzen ram calulator adjust the timings to save.

0 Likes
EFermi
Miniboss

Disable XMP on your memory (might be called differently on AMD, but I don't remember the name of the function and don't feel like reloading to go into BIOS and look) and let it run on defaults, see if this fixes the problem.

If it does, manually change memory to 3200MHz and FCLK to 1600MHz.

Since you wrote your memkit is 4000 I assume you tried running it at that clock, maybe even with 2000FCLK. This is not something many systems are capable of, I'd say it requires some rare luck to achieve, which you obviously don't have. 

0 Likes

Thanks for the reply! I've set the memory to 3200 and was still experiencing WHEA errors.

I'll take a look at the FCLK clock as well.

It seems to be processor related since I disabled the Core Performance Boost (effectively liming the CPU clock to 3.4 GHz, but no higher). Ever since disabling the boost, I haven't had a crash BUT the performance is significantly less. I'd like to be able to use the boost.

0 Likes
jhjm40
Adept I

CPU is a bad product.
It is a defective manufacturing of a particular core.

0 Likes

One would think that most people would prefer to run in-spec and be stable, than to run outside of specifications and live a horrible life of intermittent, or even reliable crashes.  Well there are those people who would rather run %1 faster and have memory that doesn't make an effort to correct itself, or even bother checking.  

You suspect your problem is with the processor, I suspect you can find the answer in the mirror.  However, you didn't come here for sarcasm, you came for help.  So here it is.

First don't do everything that is suggested on the internet.  The Lost flock to these places.  It is the blind leading the blind.  No offense blind people.  Most would not be able to discern a software error, from a hardware error,  bad BIOS setting or even, environmental factors, such as heat, EMI or static.

1. Make sure you have the latest BIOS flashed.

2. You are currently lost in a forest.  Get yourself back to square 1.   Go into BIOS and Load Optimized Defaults.  Well trained engineers and lab rats have worked for months trying to pick just the right settings for you and your chosen equipment. For the most part, trust them.   They know a whole bunch more than the people you find here, me included. They might not get every setting right, as there are billions of combinations of hardware, but they do a pretty good job most of the time.

Let's see, what are they working with here?   Ah yes, the always in fashion Trident Z!    Hmmm  32Gig.... and fast 4000MHz.  They must have been expensive.  Too bad they weren't error correcting, would have been handy here.  (I don't even have to look up the specs, chuckle).  I'm making a guess that this user doesn't care about reliability, he/she just wants a fast toy. Memory with out error correction is similar to a race car without brakes.  Most people do not realize that the early errors thrown by ECC, can help one in tuning the system.  They would rather wait for the long ignored uncorrected errors to corrupt their data or bring down their system.

3. Resist the urge to turn on XMP or DOCP.   There will be time for that later.  Both XMP and DOCP are overclocking. One may feel that it is just the Ram, but it includes the internal memory controller on the processor chip itself.  To be in spec, one must not run the memory any faster than 3200MHz.

4. Turn off Core Precision Boost.  Turning it off, will cost you an appreciable amount of performance.  There will be time to turn it back on later.  First you have to run within specs.  Here is where the Mother board engineers failed you.  They should have shipped their product with CPB off.  However they don't want to look bad when compared to the competition. CPB is overclocking, and you should run stable for a good period of time before employing it.   Otherwise, one might be tempted to join a chorus of wailing cry babies.  It is only when one has a period of stability with an in spec processor, that one can know that future failures with overclocking are a failure of the settings and not with the processor itself.

***  OK, now let's tackle some specific types of errors users might be encountering

A.  My processor is running Too HOT!

       This is a very easy fix.   AMD allowed it to run up to 90C.  One is probably uncomfortable with that.  I will still stand by the old adage that for every 10C rise in temperature, the chips life is cut in half.  Ryzen does a wonderful job of running within the temperature limit that you specify.  It checks sensors a thousand times a second, and makes adjustments to Core frequency,  the number of cores dispatched,  Number of threads running in a core and voltages.  All to keep you happy. This is the most direct way of controlling temperature.   Heck, it even keeps itself within the limits when I remove the fan!

 Set PBO to advanced,  Set PBO Limits to Manual. Set Thermal Throttle Temp to Manual.   Set Thermal  throttle temp to 75 or whatever temperature you desire.  

B.  My System crashed while idle!      WHEA...yada..yada

      This is normally a symptom of a core getting too little voltage.    A chorus of users:  But I ran Prime!   I ran Memtest!   They can not understand why, if it passed these torture tests, why it would crash while idle.    Often systems fail when idle, because the system is only giving them minimum wattage.   That wattage might be good when the chips are cool, but it might not be enough if your chips are hot.   It takes more voltage to push current through a hot chip than a cold chip.

Here I like to let the system give the cores the voltage they normally would but with just a little more.  The two most important parameters would be VCore (The voltage to the cores) and VSoc (The voltage to the internal memory controller)

VCore normally runs .2 to 1.4    (It should only get into the 1.4 volts while boosting one or two cores, otherwise 1.3ish volts)

VSoc should stay somewhere around 1.1V  to 1.2V

Turn VCore from Auto to Normal.    A differential field would present itself.    Take the smallest positive differential  .006V

Turn VSoc from Auto to Normal.    A differential field would present itself.    Take the smallest positive differential  .006V

When systems are unstable, but can still post, normally it takes only the smallest amount of extra voltage to make them stable. However, sometimes one might require  a tiny bit more.   say .012V to Vcore and/or VSoc

C.  My system crashed while playing a Game

This is a symptom of your system, not being able to keep up with your expectations. You might be able to throw more voltage or current at it to correct the situation, but you might be uncomfortable pushing those values higher.

What I am about to recommend is not mutually exclusive to Problem B above.  You can implement both solutions.    You see, there is a range of voltage the system supplies the processor during low speeds and high.   In problem B above, it added a tiny bit of voltage to the low range(which was needed) as well as the high range (which might not have been needed)

We now address the problem where your processor might be boosting to frequencies that it can not handle. Turning Power (PPT) down, will discourage the processor from selecting the higher frequencies.

PBO should be set to Advanced.  PBO Limits should be set to Manual.  PPT (For Ryzen 5900 and 5950) can be set to 120W

These settings should allow you to run cool, and stable.   Run with them for a period of time.  Your scores will come when you turn Core Precision Boost back on.   As well as XMP.    While you might see a bit of performance drop with CBP off,  most of it should come back when you turn in on again.

Please, if you can, disregard the sarcasm, it is the only thing that keeps me entertained as I type this. 

I appreciate your post.  I write software, but I am so ignorant when it comes to some of the engineering side of hardware.  I build a system every 5 years for the last 20 years but never really dive in to the details.  For the WHEA 18 idle issue option B, many on here talk about the curve optimizer + 5-10 volts.  Can you explain the difference between that and your method.  Also, what is the cost of these methods in performance roughly speaking?  Are we talking about a 2% loss in performance of like 40% loss?

Thanks in advance.

0 Likes

So Update got the 5900 rma approved, in the meantime purchased a 5950 and a b550 unify motherboard. On motherboard used both power connectors on top and it has a separate power connector for the pci so I have that plugged in as well. No whea errors at all none. System runs great I don’t know if it’s due to the design of the past motherboard but I’m willing to bet something was wrong with the chip as well as the past motherboard design as power delivery. 

The 5950x should be better silicon and should give you less trouble.

Good Luck.,

0 Likes
introibo
Adept I

[SOLVED] FIX - CPU was defective. Returned it and got a new one. New CPU works perfectly, no errors.

With the old CPU I could only get it stable if I disabled Core Performance Boost and Precision Boost Overdrive. Effectively handicapping my 5950x to 3.4 GHz. Performance was greatly reduced. While it was handicapped I could still get my RAM clocked to 3600 MHz and it was stable.

I've seen most people with permanent fixes for this issue either RMA the CPU or return it and get a new one. The new one works!

Hopefully this saves you headaches and hours of trouble!

Congratulations,  

Now is some time for some well deserved fun.

Thanks for the detailed troubleshooting reply earlier!

0 Likes

Hey Introibo,

 

Did amd give you an advanced RMA?

 

Thanks

0 Likes

No, I purchased from Amazon. I figured the RMA would take up to 5 weeks so I returned the defective unit to Amazon and purchased a new one via Amazon Prime. Got the new CPU in 2 days and had it installed instantly.

Amazon is processing a refund on the defective CPU.

Not sure if it's an advanced RMA or not since it's from Amazon.

 

 

0 Likes

If anybody is having this same issue I have found a way to get the processor working. Now I know it sounds counter intuitive, but if you want the CPU to not suffer anymore BSOD with the whea_uncorrectable_error messages, overclock your CPU. For real, overclock the 5950x or 5900x. What I've found with the ones I've come across is that whenever the CPU hits its' max turbo boost spedd of 4.9GHz and stays there, it causes errors and crashes the PC. Spoke to a few PC buddies of mine, they say it is because at those frequencies thr CPU is having troible accessing L3 cache so it throws a bus error resulting in the whea_uncorrectable_error BSOD. So what do we do? We overclock the CPU. This sets the CPU's new top frequency of all cores to that overclock  and you can dial it in gradually to get to a frequency as cloaw to 4.9GHz as you can. I have seen friends hit 4.7GHz with liquid cooling/AIO coolers and it is stable under testing ans gaming. I've had my own 5950x overclocked to 4.2GHz and it runs faster than stock at multithreading, losing only slightly at single threading but nit by a large margin. Tested on some benchmarks like Cinebench R23 and some games benchmarks. Mine used to crash on any game, but at 4.2VHz overclock runs smooth as butter. I recommend using a GUI overclocking tool like Asus AI Suite 3 for easy overclocking when it boots into Windows, but you can apply it from BIOS if you wish. A good workaround if you are stuck with your chip and cannot perform an RMA or sell it

0 Likes

I'm trying to understand how overclocking a cpu that once reaches a certain frequency throws whea errors and crashes? How would this change anything? Did you do a all core overclock? If your 5950 can only reach 4.2 overclocked and a 5600 max boost is a 4.6 I'm trying to understand the logic behind this small yet soon to be detrimental gain. 

0 Likes

“I'm trying to understand how overclocking a cpu that once reaches a certain frequency throws whea errors and crashes?”

CPU runs at too high a frequency due to the boost that it could not maintain stable signal, and so it crashes.

 

“How would this change anything? Did you do a all core overclock?”

Yes it was all core overclock. When you overclock it affects all cores so every core was set to 4.2GHz

 

“If your 5950 can only reach 4.2 overclocked and a 5600 max boost is a 4.6 I'm trying to understand the logic behind this small yet soon to be detrimental gain. “
5600 will only boost to 4.6 on certain cores and when running on stock, the 4.2 is maintained across all cores and so single thread performance would be affected but multicore performance is better than stock overall due to the overclock. Multicore frequency is usually slower and won’t boost as high, for 5950x it is around 3.6-3.8GHz, so the overclock makes it run at all core 4.2GHz instead so you get a boost in multicore performance. Single thread cannot boost to 4.9 GHz though for my 5950x due to the same overclock, but that protected the PC from whea errors. 

That comment is also pretty out-of-date now as I updated my BIOS to the latest bios with latest AGESA 1.2.0.3 with patch C and the issue is now gone, so I have it now on stock boosting to 4.9GHz without trouble.

0 Likes

How do you RMA? I cannot find any information on how to RMA a CPU. They hide the instructions on purpose lol. I cannot find anything about how to RMA an AMD CPU. 

0 Likes
koguma
Adept II

You reached the same solution as I did, RMA.    That solved these issues as per this thread:  https://community.amd.com/t5/processors/ryzen-5900x-system-constantly-crashing-restarting-whea-logge...

No issues at all after RMA.

 

0 Likes
stereo55
Adept II

My first 5900x ( when they were first released) had some whea errors , but more so would not run any game and/or program without a crash/freeze /reboot pc  under minor loads (oc or stock) . ~ Mine was  still under a 30day exchange/return new purchase timing ,  so a return and rebuy of a new 5900x fixed my problem vs having to do an AMD warranty . ~ I still have and currently using the replacement 5900x in this current gaming rig  and its been xint /problem free and  quite an xint overclocker as well  ; 5gig+. 

Now adays anyone having WHEA problems , JUST RMA IT !

0 Likes