cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

lakesideguy
Adept II

Ryzen 9 3900XT - WHEA event "A fatal hardware error has occurred" - "Cache hiarchy error"

Symptom:

Random reboot that ALWAYS occur in 2D graphics with sound and video on in the EDGE browser.

Event Viewer report (NO BSOD just a reboot of system):

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error

EDIT (found GUID in registry) :

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WINEVT\Channels\Microsoft-Windows-Kernel-Power/Diagnostic

-Provider
   [ Name]Microsoft-Windows-Kernel-Power
   [ Guid]{331c3b3a-2005-44c2-ac5e-77220c37d6b4}
  
 EventID41
  
 Version8
  
 Level1
  
 Task63
  
 Opcode0
  
 Keywords0x8000400000000002
  
-TimeCreated
   [ SystemTime]2021-05-09T12:53:08.0082874Z
  
 EventRecordID4812
  
 Correlation
  
-Execution
   [ ProcessID]4
   [ ThreadID]8
  
 ChannelSystem
  
 ComputerREMOVED
  
-Security
   [ UserID]S-1-5-18
-EventData
  BugcheckCode0
  BugcheckParameter10x0
  BugcheckParameter20x0
  BugcheckParameter30x0
  BugcheckParameter40x0
  SleepInProgress0
  PowerButtonTimestamp0
  BootAppStatus0
  Checkpoint0
  ConnectedStandbyInProgressfalse
  SystemSleepTransitionsToOn148
  CsEntryScenarioInstanceId0
  BugcheckInfoFromEFIfalse
  CheckpointStatus0
  CsEntryScenarioInstanceIdV20
  LongPowerButtonPressDetectedfalse

 

System Specs:

ASUS TUF GAMING X570 Pro Wifi with BIOS 3801 -------< SUSPECT AGESA and I will roll back to 3062 3/12/2021 before 5000 series updates

64GB of corsair ram model CMW32GX4M2Z3200C1600(Corsair Vengeance RGB Pro 32GB (2x16GB) DDR4 3200 (PC4-25600) C16 AMD Optimized Memory)

RAM installed has RGB and fan cooler

EVGA SuperNova G3 Gold 1000W PSU

Nvidia Asus Geforce Rog Strix 1070Ti 8GB GPU driver version 466.27

Sabrent Rocket PCI-E Gen 4 1TB with large heatsink

Corsair iCUE H150i RGB 360mm water cooler

Creative Soundblaster AE5

Logitech Z906 Surround Sound THX-Certified 5.1 Speaker System

Corsair K95 Platinum XT RGB keyboard

Corsair Dark Pro RGB wireless mouse

Western Digital RED 6TB backup and data storage drive

Microsoft Windows 10 Professional version 20H2

AMD Chipset driver version 2.13.27.501

CyberPower CP850AVRLCD Intelligent LCD UPS System, 850VA/510W, 9 Outlets, AVR, Mini-Tower, Black

-------------------------------------------------------------------------------------------------------------------------

Summary of problem and attempts to resolve:

Here has been what I've done to locate the problem and believed to be the CPU causing it.

1) Replaced the NVme with a PCI-E 4 upgrade

2) Corsair ram rated for 3200.  Changed from  DOCP and it ran defaulted at 2600.

3) Replaced and upgraded the motherboard from an X570 Asus Wifi PLUS Wifi to Asus X570 Wifi PRO Wifi.

4) Flashed and upgraded both motherboard bios to latest versions.

5) The power supply as I have two was swapped between G2 750 PSU and G3 1000 PSU

6) Reinstalled Windows in all cases

7) upgraded water cooler from Cooler master 270mm to Corsair 320mm so it's not heat related at all

Everything is run at defaults in the BIOS and nothing is overclocked manually

9) All drivers , including chipset, are installed and recent in all cases

The only thing NOT replaced is the CPU itself and a 3900XT is not cheap so it's the last resort.   I like to do video processing, gaming, and the system definitely can do it all and is well prepared for years to come!   But the CPU or AMD's AGESA PI more than likely took a dump.   

The windows event error appears to be spot on with the problem from the time it happened.  It has occurred more frequently now.   Finding it was not cheap either so I'm a bit ticked off.

The CPU is being RMA'd if the flash back to AGESA version V2 1.2.0.1 before the USB fix was put in for the 5000 series.  I posted it for the benefit of others and so AMD can see the system setup and what was done to attempt to fix it.  

<<<<SUSPECTED ISSUE WITH PI 1.2.0.2 fix for USB issues and 5000 series and you have installed a 3000 series>>>>
Version 3801 Beta Version
2021/04/09 21.01 MBytes

TUF GAMING X570-PRO (WI-FI) BIOS 3801
"- Update AMD AM4 AGESA V2 PI 1.2.0.2
- Fix USB connectivity issue

It has taken roughly a month to isolate it and upgrading parts.  It was worth it going from PCI-E 3 Nvme to 4 anyway.   I will be giving support my displeasure on how far I've had to go to isolate the CPU or BIOS as the issue.  But before I do this RMA I 'm going to run the previous version of the BIOS 3602 March 12th,2021 with AGESA V2 PI 1.2.0.1 after researching this more and will report back if it does it again.  I'm not rude and it was my choice to spend the additional funds in the end but I wanted it found and out of my computer.  Others experiencing this must be infuriated and frustrated!

NOTE:

AMD mentions when you turn DOCP on that it could effect PCI-E Gen 4 devices.

I can tell you it does not effect the Sabrent Rocket.  The swap occurred BEFORE and AFTER with a Crucial P1 1TB NvME PCI-E Gen 3 and they both do the reboot and hiearchy cache problem in BOTH cases.  DOCP on or off does not matter what Gen 3 or 4 you run on the Sabrent has made no impact.

NOTE 2:

I've learned that HWinfo may be also be causing it so I'll be uninstalling it and reporting this back.

1 Reply
lakesideguy
Adept II

After two weeks something has happened after rolling to BIOS 3602 AGESA PI  V2 1.2.0.1

And this is something that I have had since I bought the processor in August of 2020 and I thought it was the video card.

I would get a random black screen and the TV would reboot.   I haven't seen that in a few weeks either and thought was a seperate problem.  It is not.   The CPU caused that black screen and "monitor reboot".   And here I thought this could have been my video card until right after that I got a WHEA processor event but it recovered.

Hardware info had zero to do with any of my cases as well and had no effect of improvement.  Nothing has improved it.

The CPU 3900XT is being RMA'd.  There's nothing else that has convinced me any longer after all this troubleshooting and money spending.

Time to call AMD and initiate RMA.
 

A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

The details view of this entry contains further information.

 

System
  
-Provider
   [ Name]Microsoft-Windows-WHEA-Logger
   [ Guid]{c26c4f3c-3f66-4e99-8f8a-39405cfed220}
  
 EventID19
  
 Version0
  
 Level3
  
 Task0
  
 Opcode0
  
 Keywords0x8000000000000000
  
-TimeCreated
   [ SystemTime]2021-05-21T23:19:28.2193164Z
  
 EventRecordID9067
  
-Correlation
   [ ActivityID]{64caffe2-a24b-4ef2-8682-03a97103551e}
  
-Execution
   [ ProcessID]6824
   [ ThreadID]7796
  
 ChannelSystem
  
 ComputerTWISTEDSISTER
  
-Security
   [ UserID]S-1-5-19
-EventData
  ErrorSource0
  ApicId0
  MCABank27
  MciStat0x982000000002080b
  MciAddr0x0
  MciMisc0xd01a0ffe00000000
  ErrorType10
  TransactionType256
  Participation0
  RequestType0
  MemorIO2
  MemHierarchyLvl3
  Timeout0
  OperationType256
  Channel256
  Length936
  RawData435045521002FFFFFFFF03000200000002000000A80300001B131700150515140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B248949139377F4BA8F1E0062805C2A37CE27E5F954ED70100000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000200000000000000000000000000000000000000000000007F010000000000000002040000030000100F87000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100F8700000818000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000B3F8F31CB1C5A249AA595EEF92FFA63C01000000000000009E07C0000400000000000000000000000000000000000000000000000000000000000000000000000200000002000000570B66BE974ED70100000000000000000000000000000000000000001B0000000B08020000002098000000000000000000000000FE0F1AD00000000000000000000500002E0001000100025A000000007D000000270000000000000000000000000000000000000000000000000010000000000000001000000000000000100000000000000010001B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000