I built this PC 22 days ago with the following new parts:
Ryzen 5 3600 w/ Thermalright AM-14 cooler
MSI B550 Gaming Edge Wifi
16GB RAM Patriot Viper 3600 CL17
Sapphire RX 5700 XT Pulse
Seasonic Core GC Gold 650W
Kingston A2000 250GB
Crucial P1 1TB
It's the second time I get the following error while gaming (2-3 hours gaming):
A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 8
I am not overclocking neither the CPU or the GPU. The RAM is using the A-XMP profile 1.
The highest temperature I get while playing is 72 Celsius on the CPU. I tested the RAM use Memtest and no errors were found.
Is this a faulty CPU?
The motherboard is currently updated with the latest driver available.
Technically you are overclocking your system by using A-XMP profile for your memory.
Memtest doesn't often stress CPU+RAM enough to spot the instability that you have later during gaming.
I suggest to run Prime95 blend for at least an hour and check if there are any rounding errors in threads logs with XMP enabled. If yes then repeat the same with XMP disabled/manually tuned memory settings.
It is still possible that your CPU or motherboard are faulty. Anyway CPU+RAM stability worth checking with Pime95 in my opinion.
Thank you for the hints. I just finished testing, ran Prime95 Blender for one hour and 15 minutes with standard Bios settings, and the same time with A-XMP on, and no errors or reboots on both scenarios.
Should I stress it a little longer? The three times happened when I was gaming, two times with Destiny 2 and one with Assassin's Creed Origins. CPU temperature is always aroung 68-72 degrees while gaming, but GPU reached 92 degrees.
I believe longer than 1h15min blend tests aren't required in your case.
If you didn't have rounding errors in Prime95 worker logs then your CPU+RAM are stable under load.
There are multiple posts on internet where ppl with similar to your specs had crashes with Cache Hierarchy Errors.
The desciption hints that it's most likely CPU/Ram overclocking or faulty CPU/RAM issue. Most often it is.
But there are also user reports where changing their AMD GPU has solved problems.
I suggest resetting your BIOS settings, updating chipset drivers from AMD site, reinstalling your graphics driver with DDU and setting pcie power management in Ryzen Balanced powerplan to "disabled" (or using "Ryzen High Perf.") to see if you'll encounter the same error during gaming.
If crashes won't be fixed then i'd start RMA process
Thank you again.
On Sunday, I played for over an hour using PBO ON + AutoOC 200MHz and had no crashes.
Then, used the bios/flash function to reinstall the BIOS and applied A-XMP Profile 2, and kept PBO OFF. I ran memtest for over 400% and got no errors. I also ran Realbench and no errors. I decided to undervolt my GPU (Sapphire RX 5700 XT Pulse) and ran Unigine Heaven 27 times, and no crashes also. I also ran Prime35 Blender for another 1h15min which, as you showed in the picture, didn't show any warning or errors. I didn't have time to actually play, so I will try and see.
This is a new installation, which I assembled on June 19, so all drivers are the newest (I think only the video driver had a new version released, which I did update), but if it crashes again, I will definitely try even doing a clean Windows 10 installation.
Just as an update, I got the error again, after playing for several hours. In this case, Windows Event Viewer showed two messages instead of one, with the difference being Processor APIC ID: 8 and Processor APIC ID: 14. I got the error after upgrading the BIOS with a new release, with the expectation of fixing it.
I undervolted the GPU to try to low the temperatures, which was successful in this case, but still got the same error.
I rearranged the RAM sticks this morning and left Memtest running for over six hours, all over 700% so far, and no errors.
So, after all of that, I think it might be the CPU.
You've done a lot.
Basing on the log that you have posted, it points out to the CPU. Considering that you've made clean windows install, updated the chipset drivers and the bios (7C91v12), verified RAM stability under prolonged stress tests and still experiencing rare crashes in stock condition it looks like certain hardware is faulty (can be CPU or motherboard).
I found a user post having similar issue on this forum. Replacing the CPU solved it. Please check through the answers, hope it can help.
You are looking in the wrong direction, a similar error often occurs among owners of amd navi gpu's. Try to temporarily replace the gpu and check if this error continues to occur.
There is a thread on Reddit where people got rid of this error by replacing gpu https://www.reddit.com/r/AMDHelp/comments/hq7jcu/cache_hierarchy_error/