cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

Psynchro
Journeyman III

5950x Cache Hierarchy Error

I have a new system built and running seemingly well, apart from this "fatal hardware error" which has popped-up at least twice (maybe more). 

Admittedly, I have another PSU coming from Corsair that will hopefully fix a 5% to 10% reproducible startup issue where the PSU has to turned off and then on before the system can power on.  Hopefully the cache hierarchy error I'm posting here now is a result of the PSU, but I thought I should mention the cache hierarchy error regardless.

The latest occurrence of this error resulted in all my current work halting as the system powered off/on by itself.  This happened in the past as well but I wasn't sitting in front of the computer when it happened.

My System Specs:
CPU - AMD 5950X
CPU Cooler - LianLi Galahad 360 AIO
Mobo - ASUS ROG X570 Crosshair VIII Hero Wifi (latest bios: 2502, XMP-Enabled (DOCP DDR4-3603 18-22-22-42), FCLK-1800MHz, PBO-Enabled, PBO Max CPU Boost Clk Override-200Mhz, Windows Power set to Ryzen Balanced (and Ryzen Performance, previously I think))
PSU - Corsair HX1200
UPS - Cyberpower 1500VA (100% charge)
GPU - ASUS TUF GAMING RTX3090 24G
RAM - G.Skill Trident Z Neo 3600 32GBx2 C18 (my motherboard is listed on the QVL) part# F4-3600C18D-64GTZN

All the latest drivers are installed, including the latest chipset drivers from AMD (not Asus).

Please advise on how I can resolve this issue, its been eating into my work productivity.  I've since set PBO and PBO Max CPU Boost to Auto and set Windows Power to Ryzen Performance to ensure adequate voltage or whatnot.  I'd very much like to be able to run this chip at its fullest capacity as I've invested in significant cooling (9 120mm fans including the 3 that are pulling on the AIO radiator).

Here's some detail from the error in event viewer:
Source: Microsoft-Windows-WHEA-Logger
Date: 11/19/2020 3:46:06 PM
Event ID: 18
Task Category: None
Level: Error
Description: A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
<EventID>18</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2020-11-19T23:46:06.1955831Z" />
<EventRecordID>81121</EventRecordID>
<Correlation ActivityID="{a292dcc3-5fa1-49ae-b6ff-cc1349f2e997}" />
<Execution ProcessID="4780" ThreadID="5744" />
<Channel>System</Channel>
<Computer>KMW10X64PC00</Computer>
<Security UserID="S-1-5-19" />
</System>
<EventData>
<Data Name="ErrorSource">3</Data>
<Data Name="ApicId">0</Data>
<Data Name="MCABank">5</Data>
<Data Name="MciStat">0xbaa0000000030150</Data>
<Data Name="MciAddr">0x0</Data>
<Data Name="MciMisc">0xd01a0ffe00000000</Data>
<Data Name="ErrorType">9</Data>
<Data Name="TransactionType">0</Data>
<Data Name="Participation">256</Data>
<Data Name="RequestType">5</Data>
<Data Name="MemorIO">256</Data>
<Data Name="MemHierarchyLvl">0</Data>
<Data Name="Timeout">256</Data>
<Data Name="OperationType">256</Data>
<Data Name="Channel">256</Data>
<Data Name="Length">936</Data>


50 Replies

I'm in the same boat. I noticed someone here mentioning it's always under low load circumstances and that is typical here also. It always happened when I'm getting back at it in the morning or after a long-ish break, never when I'm actually working.

I have everything set at default. Just reduced RAM speed to 2133Mhz as suggested in the overclocking.net thread to see if that stabilizes things.

It's a real bummer since this primarily is a workstation that is meant to be stable at all times, especially considering the amount of money that has been thrown at it.

Gigabyte X570 Aorus Master
AMD Ryzen 9 5950X
Kingston HyperX Predator 128GB DDR4 3600
Gigabyte RTX 3070 Gaming OC 8G
Samsung 980 PRO M.2 1Tb x 2

0 Likes

Been over 60hrs and my PC's now stable. Turns it out was HWinfo64 that cause my PC to throw WHEA error 18 and reboot the PC.

[ AMD Ryzen 9 5950X | NZXT Kraken x63 | MSI MPG X570 GAMING PRO CARBON 7B93v1B | 64GB DDR4 3600MHz G.Skill Neo 16-19-19-39 | ASROCK 6900XT Phantom Gaming D | EVGA G3-1000 | SSD 970 EVO NVMe M.2 1TB ]
0 Likes

If you believe your CPU might be faulty you can apply for an RMA here and we will troubleshoot with you to determine if your processor is indeed faulty. https://www.amd.com/en/support/kb/warranty-information/rma-form

Hey man. Have you found any solution? We have the same exact error.

0 Likes

There's a post on here about a voltage offset that you put into the curve in BIOS that supposedly fixes this. I would search that out for your board. Without knowing full specs, I have no way of looking it up for your particular setup. @Erantel is the poster with a solution for the offset in a similar post. Click his name and look for his posts about 5950X crashing.

"It worked before you broke it!"
0 Likes

My system seems to be stable for nearly 48h. I updated y BIOS to F33a (F30 previously) which might have fixed it. I also removed all the garbage Gigabyte software from Windows, not sure sure if that had to do anything with it.

0 Likes

 
0 Likes
Arseni009
Journeyman III

I have amd 5950x, asrock x570, ddr4 32gb of ram, 2td ssd, 850w, lattest drivers, to find the problem I used Prime95, and in the beggining or middle of the game i would get access or fatal error, amd boost is on auto, I used ddr4 speed 3200hz, and apperently 5950x doesn't like this speed, the only speed I found working is top 3066mhz, so if have any errors(most likely cpu), use prime95, or clock down ur memory speed, or clock down ur cpu overclock settings, took me only 1 day to figure it out)

One problem is setting anything to "auto" while trying to OC an AMD Zen 3. Second is you don't down clock the RAM. Make sure the RAM is on the QVL, if so or if not, skip XMP and enter the timings off the package manually, even the voltage. Set your "IF" to match your RAM at half speed. For example DDR4 3200=1600 IF, 3600=1800, 3600=1800, 3733=1866, 3933=1966, and if you got a good 5950X, 4000=2000 IF. You have an AsRock mobo, so do I and they are very picky with RAM. If you have 2 sticks, the first one better be in the 2nd slot and the second in the 4th slot. If your RAM isn't on the QVL, time to break out the math or use DRAM Calc to get close to what the RAM should run at or better. If you're using PBO, turn it off until you get the RAM stable. 

This is a guy with a 5600X running 4.5GHZ all core at 1.15v, liquid cooled, Team Dark Extreeme Gaming 3733 DDR4 at 3966Mhz/1.4v. My 3600X rig can run a hearty 4.4 all core as well and run this same RAM at this speed on an AsRock X570 Phantom Gaming 4S. Right now that unit has 2 sticks of non-sense Gskill 3600 (running at 3600) in it and is purely backup. If your 5950x cannot run 3200Mhz RAM, you either have something above wrong or you need an RMA.

"It worked before you broke it!"
0 Likes
Pxartist
Journeyman III

I would be willing to bet that your issue has to do with using PBO.

When I built my first system (also an AMD 5950X), I followed this lad's guide: https://youtu.be/dU5qLJqTSAc?t=1143 to undervolt and optimize my CPU via BIOS. After crashing my system for 70+ times and reinstalling Windows twice - first time from OS corruption and second time due to unforseen OS corruption due to a minute crash during installation, I had to come to grips that the settings that the guy achieved were simply unattainable on my CPU.

From what I've read and come across, the 5950X is simply so well optimized to the point that there really isn't much to gain from hardcore tweaking, as suggested here: https://youtu.be/dfkrp25dpQ0?t=501 .

Regardless, I still ended up with better temperatures, but had to settle for:

CO: -3

C1: -3

C2: -5

C3: -5

C4: -5

C5: -5

C6: -5

C7: -5

C8: -5

C9: -5

C10: -3

C11: -3

C12: -5

C13: -5

C14: -5

C15: -5

 

I've tried playing with the PBO offset and the PPT/TDC/EDC values, but I was never really able to get anything stable. Even just changing the PBO Max Boost offset by the slightest amount was enough to crash the system.

I would not be surprised if it had to do with your PBO Max CPU Boost Clk Override-200Mhz setting. I would highly advise that you reoptimize your system and start over.

Slowly tweak your settings and run benchmarks like Cinebench R23, Uniengine, and Prime95 to verify system stability. The Ryzen balanced and Ryzen Performance Windows Power settings are designed for Ryzen 3000, so I would switch back to the usual Windows Power settings (Balanced).

Before you do so I would like to warn you that excessive and aggressive testing (nonstop crashes) can and will corrupt your OS and force you to perform a drive reformat and OS reinstall. So I highly encourage that you either get another drive with an OS to test system stability / tweak BIOS with or backup your files.

 

 

 

EDIT: Just realized that I was replying to a post 1+ years old.

0 Likes

I've noticed my machine (Gigabyte X570S Aorus Elite - BIOS: F5c) get struck with this same issue.

I upgraded from a 3950X to a 5950X. The 3950X was rock stable, but after upgrading to the 5950X I noticed this issue occurring. Using Windows 11 here. Not sure if that's relevant.


What I do notice is that it tends to occur when I am running Folding@home at full tilt, fully utilizing the CPU and GPU.
I have the latest chipset drivers for win 11, and using the high performance Ryzen power plan.

Interestingly I've never noticed the reboot occuring. It seems to happen after I lock the screen and walk away, only to find the PC has reset itself and these messages in the System Event Log:

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 23

0 Likes