cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

BobC0728
Adept II

5900X heading back for second RMA WHEA errors, really tired of this.

Had a bad core on previous CPU that took forever to isolate as being the CPU, super random errors.  After 5 months I requested RMA and was approved and I received the replacement last week.  My backup is a 5600x running stock settings except for XMP.  Ran flawlessly for two weeks in the machine while I was waiting on the RMA and had time to reinstall the new 5900x.   I took the 5600x out this morning and put the new 5900x in and tried to play a game.  Stock BIOS settings except for XMP at 3600 for my memory, just like the 5600x.  Tried to play a game 10 minutes machine reboots.  Check event viewer WHEA 18 errors on three different cores.  Tried again, 10 minutes BSOD WHEA_UNCORRECTBALE_ERRORS.  Tried again...same story with BSOD.  Just started the RMA process.  25 years with Intel and not one bad CPU in that time.  Really questioning my decision to switch.  Two bad processors?  Do they quality control these? Really ridiculous.

I read a bunch of posts about all the things I can do to stabilize the CPU.....no way.  I paid $350 for this chip and I want to use it like it was intended...not gimp it down to keep it stable. 

I thought AMD was a great company, but after my issues and their latest AM5 issues really questioning what I thought. 

0 Likes
1 Solution
BobC0728
Adept II

Nope....not the solution.  Could not get it stable after weeks.  New motherboard, new memory.  Messed with tons of settings in BIOS, but would always fail at some point.  AMD approved RMA yesterday.  Just sent it back.

View solution in original post

0 Likes
19 Replies
johnnyenglish
Grandmaster

I understand that you may be tired of it.. but have you took the time in this forum to narrow it down to the CPU?

Because, two bad CPU in a row is super weird.
I had well over...   I don't even know how many I have, but with over 20+ AMD chips and only one was DOA....

I don't know your system so its like flying blind, however, if this consumption chart is valid. (from Guru3d)
The 3900X is normally power hungry.

If you have a bad pair with a bad PSU and/or a weak board with a weak design VRM, that behavior could be expected.

Just my 2 cents

johnnyenglish_0-1685138977108.png

 

EDIT: I did the research for 3900X but power consumption is even worse on 5900X.


The Englishman
BobC0728
Adept II

It's a Gigabyte X570S Aorus Master....very strong VRM.  PSU is a Corsair RM850x.  It is definitely the CPU.  The 5600x works perfectly in the same machine.

0 Likes

I was flying blind before but I'm pretty sure You meant Gigabyte.

The Englishman
0 Likes
BobC0728
Adept II

Yup Gigabyte.  I fixed it. Got confused.  I bought two boards while first chip was on RMA.  I wanted a board with more M.2 slots and ultimately settled on the Gigbayte and returned the Asrock.

Does it make any sense that the machine is super stable with a 5600x and awful with a 5900x with the same settings?  I know the 5900x draws more power, but....I calculate usage in the 550 watt area...850w PSU is overkill.

Specs are:

  • PSU: Corsair RM850x Gold
  • Corsair 2x16 DDR4 18-22-22 3600
  • Gigabyte X570S Aorus Master running latest BIOS F5b
  • VRM is 16 Phase (14+2, 70A MOSFETs for Vcore)
  • Chipset drivers installed from AMD website, not Gigabyte.
  • MSI RTX 3080 running latest nVida Drivers

 

0 Likes

Is the memory in the QVL? The exact same part number.
Is it running at 3600, if so try disabling it for troubleshooting purposes and see if it still BSOD.
Have you done a memtest86 for a couple of hours.
Use OCCT to stress specific components, CPU or Memory.

The Englishman
0 Likes

The memory is NOT on the QVL for any motherboard I have looked at.  And the Memory QVL for the Aorus Master X570S has not had an update since Sept of 2021.  I am not a big believer in memory QVL specs. When I bought in November last year it said "optimized" for AMD processors, but it did not have many reviews.  So I don't think it was ever really tested by MB companies because Corsair probably did not make much of it.  When the last CPU was giving errors it was mostly really random BSOD and status_access_overflow errors in Chrome.  So I bought new QVL listed RAM 3200, issue persisted.  So I returned that RAM. From what I was reading about that previous error it was a specific core would cause an issue but not a WHEA error. I was able to isolate that issue using Ryzen Master and turning off cores and see when I got the errors.  It took 5 days of testing but I would only get errors when Core 4 was on.  Which on that CPU was the gold star core on that CCD....go figure.

Yes the current memory is running at 3600.  The stock speed is 2666.  I tested the memory with memtest86 multiple times with last CPU for 4 cycles.  Never any errors.  Slowing down the memory is not something I am willing to do.  Running the memory at 3600 is the rated XMP speed and well within what the MB will allow.

I will run OCCT overnight tonight.

But....I put the 5900x chip back in this morning and changed these BIOS settings based on another post:

  • Load optimized Defaults
  • PSS/AMD quiet and Cool: disabled
  • Global C-State Control: disabled
  • Power Supply Idle Control: Typical Current Idle
  • DRAM, Power down enabled: disabled
  • Turn on XMP
  • Post said to Disable: Gear down mode for DRAM, but I could not find it so I did not do that one.

So far have been able to play a game for over an hour with no issues.  Every try yesterday with default BIOS settings would fail within 10 minutes.  Same game yesterday and today.

0 Likes

Just failed....found the gear down mode.  Just disabled that...try again.

0 Likes

Running memory at 3200 speed at the moment (down from 3600). 

0 Likes
ThreeDee
Paragon

what cooler and case are you running? 5900x is a bit more demanding on VRM's so they get warmer. IF you don't have adequate airflow over the VRM's and RAM, that can cause issues .. might not be the case, but just throwing that out there

I've ran 2 x 5900x's on 3 different ASRock motherboards with out issue .. I currently run 2 AM5 setups and they are screaming stable machines with Hynix RAM .. had issues with a Samsung RAM kit though.

IF you bump up your RAM voltage from 1.35v to 1.36v .. or even 1.37v .. does that help at all? Your doing single Die chip with the 5600x to a double Die chip with the 5900x .. so that could be coming into play here .. maybe

As far as QVL this or that .. IF I've ever ran QVL with my AMD stuff, it was by accident .. and I generally don't have issues at all. I have an Oloy 2x16GB 3600 1.35v kit that I had to run at 1.36v with my 5900x on my x570 Taichi to be stable


ThreeDee PC specs

Case is Lian Li 216, with the two big 160's.  They are running, but currently still have case open for trouble shooting on the recent re-build.

Cooler is Arctic LF II 280 mounted up top which has the VRM fan....slightly gimmicky, but it does lower temps ever so slightly.  Temp is never an issue, machine rarely gets over 70 degrees. Gaming it runs around 65. Idle is around 36-39.

I lowered the memory to 3200 and kept XMP on with all xmp settings on auto it didn't crash.  Gamed for like 90 minutes....longest session yet.

I can try bumping up the memory to 1.36, but the weird thing in BIOS it says its set at 1.35 but it shows actual at like 1.375.  is that normal? 

What about setting the SOC higher?  HWinfo show it at 1.175 at 3600 and 0.99 at 3200.

 

I have two different WHEA 18 errors:

  • When it crashes I get one or more of "Cache Hierarchy Exception" across different random APIC ID's
  • And on some of the crashes I get one "Bus/Interconnector Error" and it is always on APCI ID: 0

 

 

0 Likes

Following the QVL is not an unbreakable rule but rather a solid suggestion.

Its not that the memory kit won't work, it will, but it was not validated and could give you problems.

I had a Corsair kit on the QVL and even so, I had weird problems only on DX11 games. Crashes, Reboots and BSOD's. After increasing latency a bit and some other timings plus a bit more on the vSOC, went stable.

You could increase vSOC to 1.2 on 3600 and see how it goes. I had 1.1 just to get 3200 working on a Ryzen 2700X. But don't push this voltage too much!!

The disable XMP was purely for troubleshooting, its not a permanent change but it will clear all our doubts. If the problems persists then lets go after the CPU.

As for troubleshooting the CPU for, lets say, "a bad core". 

I would disable one CCD if possible on BIOS and then the other.


Good Luck

The Englishman
0 Likes

Thanks for all the replies and suggestions.  Just really tired of this. Going on 6 months of trying to get a stable PC.

I understand what you are saying about the QVL.  However, it does suck that Gigabyte has not updated the QVL for my motherboard since Sept of 2021.  Almost two years.  So basically they stopped testing memory for it.  The 2x8 version of my memory is on the QVL, but not the 2x16 version. I have CMK32GX4M2Z3600C18. Yesterday I spent two hours trying to find memory online to buy that was on GIgabytes QVL.  Problem is it is not in stock, probably because it is discontinued.  I did find some G.Skill memory that is NOT on the Gigabyte QVL (because nothing is).  However it is on AMD's QVL and the G.Skill website shows my MB for this memory.  Also, I found a lot of posts that say Corsair memory and AMD Ryzen don't seem to play nice together.  I am starting to agree. And based on your post you had Corsair issues too.  The G.Skill memory (F4-3600C16D-32GTZNC) has lower latency and says it is designed for X570 MB's. 

  • DDR4 3600 (PC4 28800)
  • Timing 16-19-19-39
  • CAS Latency 16
  • Voltage 1.35V
  • Compatible with AMD Ryzen 3000 Series CPUs & AMD X570 Motherboards

Anyone have an opinion on the memory?

I just cleared CMOS and reset to optimized settings and turned on XMP and lowered the memory speed to 3200.  So far that is only setting that has not failed.  Albeit it was for only 90 mins, it is all I have to go on at the moment.  Going to try that for a few days.

0 Likes
BobC0728
Adept II

My goal on this build was to turn on XMP and undervolt the chip a bit with the curve optimizer.  That's it.  Overclocking and pushing the boundaries is not something I want to get into. It is so time consuming.  I figured this would be easy. Boy was I wrong.

0 Likes
BobC0728
Adept II

WHEA error at 3200 speed, took 6 hours, but it just happened.  Turning off XMP now.

0 Likes

Have you tried a bit more SOC voltage? If its at 0,9v like you said, maybe 1,1v

The Englishman
0 Likes
BobC0728
Adept II

I gave up trying to stabilize this memory.  At this point I am trying to see if this chip is garbage. If it stays stable with XMP off I am getting some new 3600 QVL memory and if it the new memory fails the chip is going back.  I have read dozens upon dozens of memory reviews where you just turn on XMP and it works.  I am not interested in messing with any more settings.  This was the exact reason I built on the AM4 platform instead of AM5.  It is supposed to be a stable platform.  The BIOS, boards, memory are all mature and the bugs have been worked out.

0 Likes
BobC0728
Adept II

Make sure you plug your cables fully into the PSU....and like magic it solves all problems.  doh!

0 Likes

Glad you make it.

Enjoy the CPU ; - )

The Englishman
0 Likes
BobC0728
Adept II

Nope....not the solution.  Could not get it stable after weeks.  New motherboard, new memory.  Messed with tons of settings in BIOS, but would always fail at some point.  AMD approved RMA yesterday.  Just sent it back.

0 Likes