cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

mrm21632
Journeyman III

[5950X] WHEA-Logger Event 18 + Kernel-Power Event 41 (63) After RMA'ing

First, here are my current system specs, so you know what I'm working with:

  • Use Case: Hobbyist and/or professional development (multi-platform, web dev, probably some ML/AI), lighter gaming (around 2-3 hrs/day)
  • CPU: Ryzen 9 5950X
  • GPU: EVGA XC3 Ultra RTX 3070 (UV'd to 1965MHz @ 900mV, Mem +1000MHz)
  • Memory: 64GB Crucial Ballistix @ DDR4-3600 (16-18-18-38, Single-rank Micron rev.B)
  • Motherboard: MSI MEG X570 Unify (BIOS revision A91 Beta)
  • PSU: SeaSonic Focus PX-750 80+ Platinum Certified
  • Case: Lian Li PC-O11 Dynamic (non-XL) w/ 6x be quiet! Pure Wings 2 PWM
  • OS: Windows 10 Pro 19042.870
  • Relevant System Drivers: AMD Chipset 2.11.26.106, GeForce Game Ready Driver 461.92
  • Other Relevant Info: Virtualization is currently enabled (for VirtualBox), a TPM 2.0 module is installed and enabled

Figured I'd see if the Community Forum could offer some advice for the problem I've been having with my new AMD PC. I've been experiencing serious stability issues on-and-off pretty much since I first built it in December, and still haven't found a definite solution to the problem. Even after RMA'ing my first 5950X, which brought me some much needed stability for about a month, the issues returned and seem worse than before.

I consistently experience a full system crash (no BSOD, just black screen and restart), often within 15-30 minutes after startup, and sometimes in less than 10 minutes. This seems to be triggered each time by a Kernel-Power critical event (ID 41, task category (63)) followed very closely (i.e., often less than 10 seconds) by a WHEA-Logger error event (ID 18). The WHEA-Logger event is usually a Cache Hierarchy error on (to my knowledge) the weakest core in the CPU, but occasionally I'll see a Bus/Interconnect error on either that same core or core 0.

Here is what I've done so far for troubleshooting:

  • I have swapped out memory kits several times. I started with a G.SKILL TridentZ RGB 32GB kit, then swapped that for a Corsair Vengeance LPX 16GB kit, then finally swapped that for the two Ballistix kits I'm using. I've tested the last two kits fairly thoroughly (e.g., at least 1500-2000% coverage in Karhu) both with and without XMP profiles enabled, and never experienced an error while testing.
  • While RMA'ing the first CPU, I used an R5 3600 in the system with no noticeable stability issues. I never experienced a crash once while the 3600 was installed.
  • I've updated the BIOS several times, including several vA8X beta versions from MSI. I'm not sure if this ever made a difference, but I was on vA80 while the system was stable.
  • I have tested with and without the GPU overclock/undervolt settings, but this doesn't seem to make much of a difference.
  • I have experimented quite a bit with CPU settings. Disabling CPB seems to make the system stable, albeit at a significant performance cost. Disabling PBO completely doesn't seem to make a difference - if CPB is still enabled, the system will usually crash. Manually setting PBO limits (PPT, TDC, and EDC) didn't seem to make a difference either.

At this point, I'm really not sure what could be causing the issue. I think I can safely rule out the memory, and I'm not yet convinced it's the CPU. The only things I haven't really tried and am considering are:

  • Removing the TPM module. Not sure how likely it is this could cause an issue, but it's a factor I haven't yet isolated.
  • Replacing the power supply. PSUs are a little hard to come by, so this hasn't been top priority. Still, it's entirely possible there's an issue with the PSU, such as not supplying enough power (for some reason - the wattage rating should be high enough), or just being faulty.
  • Reinstalling Windows. Not sure how likely this is at all to solve the problem, but apparently some people have fixed similar issues with their systems by reinstalling Windows and the system/chipset drivers. Worth a shot, I guess.
  • Replacing the motherboard: worst-case scenario by far. Again, not sure how likely this is, but it's very possible that something isn't working correctly on the motherboard itself.

I'd appreciate any advice you guys can offer. I'm honestly starting to run out of ideas and hope.

0 Likes