cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

Luismo
Adept I

5600x / WHEA Logger 18 Cache Hierarchy Error / Kernel Power error 41

Hello All. I will try to do it very simple:

Problem:
My new build with the below specs reboots randomly with no BSOD. Just freezes for a second, turns into black screen and reboots without loseing completely power. The Windows Event viewer throws WHEA Logger 18 (Cache Hierarchy error) and sometines just Kernel Power error 41.

When this happend?
Anytime. Maybe idle, maybe watching some youtube or playing any game (demanding or not)

Specs: 
Ryzen 5 5600x
Crucial Ballistic 2x 16GB RAM 
RTX 2070
Gigabyte Aours Elite V2 (Rev 1.0)

Troubleshoot:
I have read almost every post with this error. I have the RMA running, they come for the CPU next Monday, but until then I want to make some test with any of your help. Changed PSU, update BIOS, run different BIOS version suposelly more stable, update chipset AM4, clean windows install in different hard drives, updating graphic drivers, checked and changed all power and energy options in windows (including fast boot), tried to disable XMP, c-states, PBO, Core Performance Boost, run the system with no overclock at all. tried different SOC and DRAM Voltages... Nothing helps

OCCT Memory test crash every time before 10 minutes, throwing the errors I mention.

In dump files I got the followin error:

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon. Try !errrec Address of the WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffff900f9154eb90, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000bea00000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000108, Low order 32-bits of the MCi_STATUS value.
BUGCHECK_CODE:  124

BUGCHECK_P1: 0

BUGCHECK_P2: ffff900f9154eb90

BUGCHECK_P3: bea00000

BUGCHECK_P4: 108

PROCESS_NAME:  smss.exe

STACK_TEXT:  
ffffc00f`3b0fa150 fffff801`6715989f     : ffff900f`9154eb70 00000000`00000000 ffff900f`9154eb90 00000000`00000022 : nt!LkmdTelCreateReport+0x13e
ffffc00f`3b0fa690 fffff801`67159796     : ffff900f`9154eb70 fffff801`00000000 00000065`00000000 00000065`d6bff9c0 : nt!WheapReportLiveDump+0x7b
ffffc00f`3b0fa6d0 fffff801`66fbda49     : 00000000`00000001 ffffc00f`3b0fab40 00000065`d6bff9c0 00000000`000001fc : nt!WheapReportDeferredLiveDumps+0x7a
ffffc00f`3b0fa700 fffff801`66dcf487     : 00000000`00000000 ffff900f`916ea030 00000000`00000103 00000000`00000000 : nt!WheaCrashDumpInitializationComplete+0x59
ffffc00f`3b0fa730 fffff801`66c071b8     : ffff900f`952c0000 ffff900f`952d7060 ffffc00f`3b0fab40 ffff900f`00000000 : nt!NtSetSystemInformation+0x1f7
ffffc00f`3b0faac0 00007ffb`9140f4e4     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
00000065`d6bff968 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffb`9140f4e4


MODULE_NAME: AuthenticAMD

IMAGE_NAME:  AuthenticAMD.sys

STACK_COMMAND:  .thread ; .cxr ; kb

FAILURE_BUCKET_ID:  LKD_0x124_0_AuthenticAMD_BANK0_MSCOD0000_MCACOD0108_PCC_UC_IMAGE_AuthenticAMD.sys

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {509acb9c-038f-dfdb-adc2-7917670271d1}

Followup:     MachineOwnerAny Help is appreciated. 

Any help is really apreciated.

0 Likes
1 Solution

Well, for anyone having this issue, I just RMA the Ryzen and the Motherboard, so...

Or the processor was faulty, or the mobo version or the mobo itself was faulty.

Not very much help, but it is something.

View solution in original post

0 Likes
14 Replies
Luismo
Adept I

Checked Temperatures. All fine. CPU on 100% load not even reaching 50º.

My PSU is a Corsair RM850

0 Likes

AGESA 1.1.9.0 will probably fix the problem but I believe Gigabyte hasn't released any beta BIOS with it yet. Until that happens you can disable CPB to fix the issue; just clock your CPU at whatever all-core speed it can handle until then.

0 Likes

Hello, thaks for the advice :)

I just disable CPB, and check the CPU to 3.7 and 100.00

The OCCT Memory test still gives Cache Hierarchy Error.

I also tried to modify VDDP and VDDG Voltages, no positive results either.

0 Likes

It's still rebooting also? That's unusual, you must have a different problem.

0 Likes

Yes :(

I ran the system with the stock BIOS settings and it gave a Kernel Power 41 about 10 minutes after launching and playing Cyberpunk.

I do not know if is CPU, mobo or RAM related (Prime95 generates no crashes and MemTest say Ram is ok)

0 Likes

Running OCCT Ram test with only 1 stick seems not to trigger the error.

Maybe a faulty stick? Maybe the ram connections of the mobo?

0 Likes
gavintek
Adept I

Hello, what settings do you use for the memory test on the OCCT? Do you set it to auto?

0 Likes

Hello, yes the settings in the OCCT memory tests are set on auto and % of usage at 80% (default of the test)

memoria.PNG

I also made some OCCT CPU test with normal and extreme 

occt cpu.PNG

Both tests gave the same error in event viewer:

Error de hardware irrecuperable.

Informado por el componente: núcleo del procesador
Origen del error: Machine Check Exception
Tipo de error: Cache Hierarchy Error
Id. de procesador: 10

La vista de detalles de esta entrada contiene más información.

For more information, I passed WhoCrashed to see the dumps, and here is the result of one of them:

error.PNGI also ran Memtest Again, and the log give me this error:

2021-01-13 02:21:48 - GetAMDCPUTemps - Unsupported AMD CPU (Vendor ID: AuthenticAMD 19 21 0)

I have read that maybe Memtest does not have properlly set the temp sensors read. Also my CPU temps are quite good, but just in case.

 

0 Likes

It just happened again wathching a youtube video.

errorwhea.PNG

0 Likes

My system is similar like yours. I have ryzen 5 5600x, corsair 16gb 3600mhz, Asus b550-f rog strix, 650w psu. I am able run both occt and occt memory test without crashes (XMP/DOCP enabled, PBO disabled) so i think one of your hardware must be defective?

Im no expert but from my experience, whea 18 error is not always from cpu. It could be from any other part. I had this issue on my rx 5600xt gpu (green screen reboot when playing total war games). I swapped to nvidia gpu 2 days ago and so far, no more whea 18 crash. I dont know if this fixed the issue, will have to see in 1/2 weeks.

from what i've gathered around this forum, the one with faulty CPU (and got the new one through RMA) also get whea 19 error. Do you get whea 19 error too?

and does the processor core always show the same number (number 4) on the whea 18 event log? If yes, maybe try to disable that core and run occt again. 

0 Likes

Hello :)

I have read some posts over several forums and you are right, this errors are not exclusivelly CPU errors.

Today I have tried to change the GPU. I use a Geforce RTX 2070, and I changed it to test the system with an older GTX 970 I have on a little server. The problem persisted. 

I also tried to disable the external audio card I use to record vocals, guitar etc. The problem persisted.

Do you get whea 19 error too?

No, I have never had the WHEA 19 error.

Does the processor core always show the same number (number 4) on the whea 18 event log? 

No, the processor ID changes every time the error pops up.

Right now my biggest suspect is the CPU, then the MOBO and last the RAM (I applied memtest and gave me the error I posted in the OP).

Any advice is welcome :)

 

0 Likes

Hmm showing different core ID on every error might be a good indication that the problem is not from cpu. Usually when cpu goes bad, it starts from one of the core acting up (unless the manufacturer botched up the architecture on your unit). I could be wrong tho, as i said im no expert at this.

whea 18 error is tricky to troubleshoot as it doesn't tell much and can be triggered by pretty much anything (hardware, bios, power failure, etc). Best bet is to swap each component to rule out the problematic/okay hardware. You ruled out gpu and psu it seems, so i guess your next move would be to test your cpu in other motherboard?

 

edit: this thread might interest you https://community.amd.com/t5/processors/just-bought-a-new-build-ryzen-5-5600x-gt-cpu-failure/td-p/42...

0 Likes

Thank  you for the recomendation. 

I already saw that Post and read all the comments. I tried everything related to BIOS changes or CPU/RAM configuration. BIOS updated to the last version, I even tried to go back to versions where people said they stopped haveing the issue anymore, but no results.

I have read that Gigabyte use to release the BIOS updates a little bit late, so maybe that is the fix.

Hearing that it may not be the CPU is a little bad news for me, but I already ask for a RMA. 

I do not have another motherboard to test my CPU unfortunatelly.

0 Likes

Well, for anyone having this issue, I just RMA the Ryzen and the Motherboard, so...

Or the processor was faulty, or the mobo version or the mobo itself was faulty.

Not very much help, but it is something.

0 Likes