Specs:
Ryzen 5 5600x
MSI MPG Gaming Plus B550
Crucial Ballistix 2 x 8GB 3600mhz DDR4
Corsair RM 750 Modular PSU
For the past month, I've had general instability (starting 05/08 thru to 23/08 when it finally gave up). Random crashes/reboots with no indication of errors or slowness before the crashes. I could replicate a crash by opening a few things in quick succession after it had just crashed, however, if I left it to "settle" it would be fine until the random crash struck again. On 23/08 however, I had a crash and this then got stuck in a boot loop. The machine either starts booting and will show the spinning loading wheel, or will crash once posting and rebooting itself. I cannot boot into an OS, or even an installer for an OS (Tried Ubuntu, Windows 10 21h2, and Windows 11 most recent update).
I pulled the drive and popped it into my old PC just so I have something to use (as well as the GTX 1080 that was in the system). Checked the logs and nothing to suggest why it went off and/or failed to boot (although imagine this is because it couldn't get into windows). The last event in the Ryzen built was a system uptime event.
I've gone through the following with MSI as was thinking it might be a Mobo issue... but now I'm not so sure!
-> Everything disconnected bar one stick of memory, a keyboard, mouse and a boot device (swapping between my main windows and a live Ubuntu USB).
-> CMOS Cleared
-> Display cables and graphics card swapped (Tried a GTX 460 and an R9 285, in addition to the GTX 1080, all cards work in my other PC).
-> Disabling Global C State Control and PSS Support
-> Manual voltage of 1.3125, 1.325, and 1.35 on the CPU
-> Manual Voltage of 1.35 on the RAM
-> Plugging PC directly into a wall outlet rather than a power strip
-> 500w BeQuiet PSU (about 10 years old) instead of the Corsair one
-> Reseated CPU as well as all power connectors on the board
None of the above seemed to help (and the results were the same). I did however have a play with the CPU cores and noticed it will boot with the CPU Core Control option set to ONE (1+0) the PC will boot. As soon as I switch it to TWO (2+0) I get stuck in the loop again. Same behavior if I turn SMT off. CPU will boot off of the 1st core/thread but no more.
I believe the RAM is good as both sticks give identical behavior on their own regardless of the Slot used on the Mobo, but don't have the means to test whether the CPU OR the Mobo is the culprit.
I did notice in the event logs a few events since I started having trouble, just after reboot from a black screen crash/reboot.
Event 18, WHEA-Logger
A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 2
The details view of this entry contains further information.
I can also see one with APIC ID: 3 as well (assuming this is core 2 and the thread on core 2 assuming this counts up from 0).
Something that developed after the second time it rebooted was the screen goes an array of solid colours (changes each boot) before it hits the bios drive selection bit (selection below)
Is there anything else I can attempt to do to fix this? Or any indication on whether the Mobo or CPU is more likely to be at fault? In an ideal world, I'd have a different processor I could slip in to test that! MSI had told me to go to the retailer for the replacement process of the board (Amazon) and I can also get retailer support on the CPU for another few months (Also Amazon).
Any help is appreciated.
The PC was built in March 2021 and has been solid up until last month. It has been running on BIOS defaults bar two settings
-> CSM Boot rather than UEFI (thus secure boot was disabled also)
-> XMP Profile 1 so the RAM runs at its rated speeds
The only new hardware introduced in its life was a Keyboard/Mouse and a USB Hard Drive I use for steam games. The drive was the last thing introduced about 3 months ago.
Solved! Go to Solution.
I would get a warranty replacement for the CPU, while you still can.
It does look like that is at fault, worth a try anyway..
Tweaking a bit further, manually setting all the cores to 2ghz seems to have gotten some response from it. The machine is definitely "slow" but I can get into Ubuntu with all cores enabled. Will see how far I can push it before it tips over again!
I would get a warranty replacement for the CPU, while you still can.
It does look like that is at fault, worth a try anyway..
I actually contacted Amazon earlier and have arranged a return (surprisingly they've just straight up offered a refund rather than an exchange, though shouldn't complain given the current prices!).
I did have another go after reading some articles about clocks, looks like 2+ cores, and anything over 4.0ghz trips it. Disabling Core Performace Boost when set to auto clocks, or just manually setting clocks to 4.0ghz, or below allows booting. My first couple cores would happily flick up and down between 2ghzish and 4.7ghz as required during basic use. The weird colouring etc before the device posts still occurs even with the reduced settings
Will see how I get on when I've ordered a replacement CPU.
Got a replacement a few days ago - rock solid since then!
Such a strange way to go, instability for a few days then outright failure. Ah well - happy to be back up and running!