robertbruce

WHEA / Machine check cache hierarchy error

Discussion created by robertbruce on Sep 9, 2020
Latest reply on Sep 9, 2020 by robertbruce

Hi,

 

I'm having the following entries pop up in my event log, mostly when idling:

 

From what I know it seemed to have started yesterday for no reasons at all.

 

My specs are:

Ryzen 3900X @ Stock

ASUS RoG Strix x570-E

4x8 GB G.Skill F4-3600C16-8GVKC (multiple configurations tested)

RTX 2070 Super on slot 1

Got 1 NVMe SSD, an old SATA one and 2 SATA hard drives.

 

Built the whole thing somewhere in late june so it's pretty recent too. PC is not powered on 24/7 (far from it) and I left it at stock except for XMP RAM profiles.

 

I'm not getting any crashes, all of these errors are "corrected hardware errors" and just get logged while not affecting anything (my benchmark scores are unchanged too).

 

The person here seemed to have a similar issue but noone answered: WHEA: Cache Hierarchy Error 

 

The most helpful advice I've seen is to actually RMA the CPU. That would be a huge bummer seeing how recent it is, how especially non-abusive I've been with it and how much fun I was having with the build until now. Which is my very first Ryzen build by the way.

 

I tried:

  • Stock RAM settings @ 2132 Mhz -> No effect
  • Manual RAM settings @ 2400 Mhz with very loose timings and 1.35v -> No effect
  • SoC voltage at 1.15v instead of 1.1v -> No effect
  • VDDG at 1000mV with Soc voltage at 1.15v -> No effect
  • CPU VRM load line calibration set to level 3 -> No effect
  • Mild positive CPU voltage offset (+0.03) -> No effect
  • Unplugging all my USB devices -> No effect
  • Update to the latest BIOS -> No effect, except my idle voltages seem to be even lower now
  • Use USB Flashback to install BIOS 1408 -> No effect (had high hopes for that one)

 

I checked out all that could be Windows related (Ryzen power plans, chipset & GPU drivers, ...) because booting on an old Kali Linux USB drive I had around yields the same L1 cache error in syslog. Not as many as the Windows logs will get, but still. And that's what's worrying me the most, I was hoping for a Windows issue.

 

Also seems to always be CPU 6 and 18 under Linux (which could be the same physical core ?).

 

BTW my temps are fine, the hottest Prime95 workload will only take me to about 75°C. And yields 0 errors.

 

Anyone having the same entries in event log?

 

Thanks a lot,

Outcomes