Computer Type: Desktop
GPU: Radeon RX 5700XT
CPU: Ryzen 5 3600
Motherboard: MSI B450 A Pro Max
RAM: GSkill Ripjaws 8GB X2 (16GB in total)
PSU: Thermaltake Smart RGB 700W
Case: Midtower with 1 stock fan
Operating System & Version: Windows 10 Pro Version 10.0.19041
GPU Drivers: Radeon Software (Adrenaline) 20.4.2
Chipset Drivers: AMD Chipset Software 2.5.4.352
Hard Disk: SSD - Crucial 1TB M2 Nvme
Background Applications: Happens irrespective of what applications running
Description of Original Problem: My newly built PC keeps on restarting randomly. Sometimes, it will run for 6-10 hours without any issue. Then other times it will simply restart when I open an application (browser, tabs, etc.) or games and sometimes it just restarts at its will. Every time it restarts, the event logger logs the below errror:
"A fatal hardware error has occurred.
Reported by component: Processor Core Error Source: Machine Check Exception Error Type: Cache Hierarchy Error Processor APIC ID: 11
The details view of this entry contains further information."
Troubleshooting: I have updated all the drivers. Deleted and reinstalled and updated all the drivers. Checked if the CPU fan is properly attached to the CPU, if GPU is properly attached, if RAMs are properly attached, and everything else. All of them seem perfectly fitted. Used various software to test CPU, GPU, RAM, etc. All came back with good results. Did memory test and DISM test. Both were successful without any error.
This tech site give some very good tips on what causes WHEA LOGGER Error 18: Event ID 18: Microsoft-Windows-WHEA-Logger - TechNet Articles - United States (English) - TechNet Wi...
Hi elstaci,
Thank you for your message. I literally tried almost everything and still facing the issue. Now, I am running my PC in clean boot. Will test it in clean boot for few days and see how it goes. Many people faced similar issue with 5700XT and Ryzen CPU combination. Not sure if it is the CPU or GPU. Many have changed the graphics card and many other underclocked their CPU/GPU and adjusted PSU voltage accordingly. I do not want to do either of these two. Don't know what I will finally do though.
A you running with your bios at defaults with XMP and PBO off. If not do so and see if that changes anything. If that doesn't help uninstall the AMD GPU driver and then for the sake of testing just let windows load its standard driver. See if that changes stability. Hopefully this can lead to you isolating the issue.
I will add that a lion share of the time that I have had problems that make a machine restart it comes back to memory issues. It is however very hard to say without some testing.
Also if you have not already. Boot with one stick of ram at a time and see if one is stable vs the other.
pokester wrote:
I will add that a lion share of the time that I have had problems that make a machine restart it comes back to memory issues. It is however very hard to say without some testing.
Also if you have not already. Boot with one stick of ram at a time and see if one is stable vs the other.
Hi Pokester,
Sorry for the late reply.
I have been testing and hence was waiting to see if things work and then reply. Tried with one stick and didn't work. Changed the PCIe slot of the GPU and still nothing changed. Removed GPU driver and ran PC with clean boot up and the restart stopped. Then activated other drivers and used PC without playing games and PC did not restart. I have found a way to trigger the issue. It actually happens while loading a graphics demanding application. It can be games or anything else that requires the use of GPU to higher level. However, the issue does not take place while playing games. If it happens, it will happen during loading. It either happens when I start any game or if I go to a different scent while playing the game. Thus I narrowed down the issue to loading. Also, I never turned the XMP and PBO on to begin with.
Now, I have contacted the retailer and trying to push them to change it with another card. Not sure if that will work as lots of people are having the same issue with Ryzen processor and Nvidea cards too.
@elstaci wrote:This tech site give some very good tips on what causes WHEA LOGGER Error 18: Event ID 18: Microsoft-Windows-WHEA-Logger - TechNet Articles - United States (English) - TechNet Wi...
Thanks for the update and quick reply. I'll be sure to keep an eye on this thread. Looking for the same issue. Bumped into your thread. Thanks for creating it. Looking forward for solution.
@Lawrence06 nice find. It doesn't mention and error detected on the infinity fabric by the memory controller on the CPU. This could be affected by voltage or clock speeds if using DOCP or speeds over 3200 Mhz.
I hate to think it's a RAM speed issue because nobody seems to know about it, even when I ask at the computer store or contact AMD support, they never said ram was the issue. They market higher speeds yet mention nothing about them not being compatible or that it voids your warranty (which I think is BS).
Still something to rule out, but it made no difference for me.
My first CPU was unstable and slowly got more stable with updates, the latest chipset driver I tried was from March and the latest BIOS in April/June before I gave up waiting. Last WHEA error I got was April 1st.
My second CPU was perfect for 2 months, tried to reproduce the issue and couldn't. DOCP wasn't enabled the whole time. Then suddenly it became unstable and died a week later by not being able to boot an OS properly, CPU behaved the same on other PCs. Never got a WHEA error with it.
Tried a third 5800X, latest BIOS, it crashed within 24hrs and once more the next day I believe. After updating the chipset driver and setting PBO from Auto to Disabled (The PBO change should of made no difference) it's now been on for over 7 days straight no issue. It might not be solved yet, it can be so random, but maybe the chipset driver made the difference.
Either way I've had 3 different CPUs, they all behaved differently.
Hi
I Have temporary fix, by going to BIOS and changing the override CPU Voltage to 1.25v. I also disabled PBO.
but is there a final solution? by the brand
I live in Chile and here it is difficult to do RMA
I met several threads on reddit with the same error. Most users had rx5700 or 5700xt. This issue cannot be memory related. I have tested my memory many times in stock, xmp and overclocking and all tests show 0 errors. Regardless, I bought new RAM on micron e-die, and crashes happen with the same frequency.
Are you getting any minidump files created?
pc reboots instantly, so no
Anything in C:\Windows\LiveKernelReports\WHEA?
yes, there are really a lot of dumps in this folder
If you could upload the most recent I could have a look at it.
If you want to try having a look yourself they can be opened with WinDbg preview.
You can download that with the developer SDK or through windows store (it's free)
The program that caused the issue in that dmp file was smss.exe.
Component that reported the WHEA error was the CPU.
The smss .exe file is a software component of Microsoft Session Manager Subsystem by Microsoft.
"smss.exe" is a file that manages the startup of all user sessions in Windows. The operating system's main thread activates the file. "smss.exe" launches processes such as Win32 and WinLogin. It also sets the system variables, followed by shutting down the system once those 2 files are ended. If for some reason, those 2 files do not end; "smss.exe" causes the system to hang.
This isn't something you can disable, windows won't run without it?
Are all the processor APIC ID numbers listed in the event viewer all 11?
I just checked the oldest minidump dated June and it lists the same process. Windows won't work without Session Manager Subsystem. The event log contains two errors in a row, the first contains processor APIC ID 11, the second contains processor APIC ID 0. It seems to me that the gpu driver initiates this error, since it doesn't happen on a very old driver, or on another gpu. Or it could be a gpu hardware malfunction, which began to manifest itself in newer drivers, since most users do not experience this problem. I played valorant for about an hour before the last crash. I got a message on Steam and tried to minimize the game to my desktop, after which my pc rebooted. Crashes often occur during alt + tab from fullscreen game to the desktop
fyrel wrote:
The program that caused the issue in that dmp file was smss.exe.
Component that reported the WHEA error was the CPU.
The smss .exe file is a software component of Microsoft Session Manager Subsystem by Microsoft.
"smss.exe" is a file that manages the startup of all user sessions in Windows. The operating system's main thread activates the file. "smss.exe" launches processes such as Win32 and WinLogin. It also sets the system variables, followed by shutting down the system once those 2 files are ended. If for some reason, those 2 files do not end; "smss.exe" causes the system to hang.
This isn't something you can disable, windows won't run without it?
Are all the processor APIC ID numbers listed in the event viewer all 11?
Hi Fyrel,
If it is not too much to ask, could you please have a look at the below link or the attached file:
Hello there!
I can see a lot of activity here for the WHEA-Logger error and wanted to jump in because this crash is also tormenting my setup.
My story is the same as everyone elses for the most part. PC will freeze, audio will continue for 1 second perhaps, usually my monitors will turn a certain colour each. Varying from white, pink, green. Ruled out the GPU. Troubleshooted everything you might expect. Stress tests passed multiple times. No RAM errors on tests, no storage errors, windows clean install didn't work etc.
Temps are normal for the system. The PSU is brand new, however i've not been able to test with a different one.
Feel free to breeze over this for specific details, it's lengthy: https://forums.tomshardware.com/threads/random-freezes-restart.3681733/#post-22175151
I ticked it as solved as today I realised there was thermal paste on my cpu pins, not much, but enough to make you belieb that was the issue. I've cleaned all properly, there wasn't enough for me to call it a mess. More of a smudge that laid over no more than 3 single pins. All pins are in good order with no bends down the "isles". PC has crashed one after fixing this, with the same style of crash. Implying it's not fixed. My crash will happen at least once a day. Can happen while browsing, but mostly while gaming. I say gaming lightly, like main menu or loading up games and the crash might happen.
I have some dump files that the system has created, but it hasn't recorded every crash, only a few.
I'll attach the latest one. I took some advice here and ran the verifier command, which has been running for about 45 mins now without a crash.
Drive with dump file and event view file: https://drive.google.com/drive/folders/17jyNonedYWxNlZPmNlxk_pMr8cTZAWaL?usp=sharing
Please help me
Specs:
3700x
RX 580
Tomohawk Max B450
16gb DDR Corsair RGB Pro 3200mhz
750w Corsiar RGB PSU 80+ rated
Hi all, just to chip in, on a similar system;
Ryzen 3600, RX 5700 XT, ASUS X570-I.
Noticed this issue a couple of weeks ago, would firstly only happen during Cyberpunk 2077; system would stall for up to a minute, then power cycle with no BSOD.
More recently this would happen even while watching YouTube video, or in a game of Stellaris, which is far less demanding.
Like others, set about running the usual; Furmark, Prime95, memtest86, all passed.
Other suggestions here are interesting, tried most of them, but to no avail.
The thing I think may have fixed it for me, and it sounds silly; check the seating in the PCIe slot, especially if yo use a PCIe rizer cable.
My system is build in a FormD T1 case, which is great, but the case tends to put a little pressure on the rizer cable, and this can move over time. My system is watercooled, and I've been opening it up occasionally to top it up as there are some small air pockets that are only movable once the system is on.
As the saying I conveniently forgot this time round goes; always blame the cables!
I had another WHEA ID: 18.
But this time I managed to trace the possible cause that was my distrust. The CLINK cable from the Corsair RM-1000 source. The memories of ADATA XPG arrived to complete the setup with 32GB. So, it was the moment I needed to remove the adapted cable that Corsair sent me to put in a single USB port, the source and the toilet. I removed this cable and put only the one from the toilet that came with it.
The error was pointed out exactly in the USB port where the two were. I hope I don't have any more problems with that. There is no further explanation for this. It sucks when the system doesn't point out where the problem is. As the forced reset held more this time, it even recorded the folder with some information.
About these crashes, BSODs and etc., from the experience I had so far, and in what I observed others experiencing, we need to separate for the most possible causes and work on them.
Are they:
- Undo any overclocking. Keep only Boost and XMP active and you will eliminate the options;
- Device drivers - Source, Chipset, Graphics, mouse, keyboard, toilet, fans;
- Mainboard and / or monitoring applications. Validate whether the dlls or components are old and / or conflict with other new or old applications;
- Poorly connected or broken cables in your body;
- Validate the device management cables - WC, Source, etc;
- In the last case, validate Source, CPU, Memory (if it is in the Mainboard's QLV), VGA and Mainboard.
Often the problem will be driver and software. It will hardly be a hardware problem.
accn wrote:
I met several threads on reddit with the same error. Most users had rx5700 or 5700xt. This issue cannot be memory related. I have tested my memory many times in stock, xmp and overclocking and all tests show 0 errors. Regardless, I bought new RAM on micron e-die, and crashes happen with the same frequency.
Hi Accn,
True, most of the people having this issue has Ryzen CPU and Radeon GPU. I am quite confused and cant say for sure what exactly the issue is. If it was a driver issue, then everyone with this graphic card and the latest driver would have had the issue. If everyone was having the issue, either there would have been a new driver or AMD would have discontinued this product. It can be a CPU issue as well or even a motherboard issue as I have seen many threads where the user had Ryzen CPU and Nvidea graphic card.
Perhaps the problem lies in the combination of some hardware, which works correctly separately, but is poorly compatible with each other
accn wrote:
Perhaps the problem lies in the combination of some hardware, which works correctly separately, but is poorly compatible with each other
It could very well be a compatibility issue. However, some people with nVidea cards also had similar issue but the number was much less as far as I am concerned. This begs the question if it is a processor issue. I will research a bit more if I get a chance and try to find if people with Intel CPU and 5700xt had similar issue.
I don't discard the possibility that may be a combination of things, as pointed before, but I think Windows may be a problem too. A few people installed the previous version (1903 or 1909, I am not sure), which apparently solved the problem.
There really was no whea logger event id 18 on the windows 1909 build, this error started appearing in the event log after the 2004 update, but the crashes were exactly the same. I can't say anything about 1903, when I bought a gpu I already had 1909, so you can give it a try
accn wrote:
I met several threads on reddit with the same error. Most users had rx5700 or 5700xt. This issue cannot be memory related. I have tested my memory many times in stock, xmp and overclocking and all tests show 0 errors. Regardless, I bought new RAM on micron e-die, and crashes happen with the same frequency.
I understand your point. Anyways, I have finally changed my GPU to RTX 2070 and will test it and see how it goes. Will keep all of you posted.
Unfortunately the WHEA dumps don't contain enough information on what caused the crash.
It's quite common with 0x124 errors I believe.
I would like you both to run verifier and then upload the minidmp produced when you next blue screen.
The full instructions are in the link below, please make sure you understand how to turn verifier off.
Verifier is a diagnostic tool designed to stress test your computers drivers till they break.
Hopefully doing so in a way that produces useable data.
Leave verifier running for up to 48 hours or until your first blue screen then turn it off.
Your system may feel a bit sluggish while it's running.
https://www.tenforums.com/tutorials/5470-enable-disable-driver-verifier-windows-10-a.html
Done. I will report the result below. And thanks again for your participation.
p.s. by the way, I noticed a strange pattern a long time ago. My system crashes in a similar way during the timespy extreme cpu test. It happens regardless of the settings and overclocking of cpu and ram. Sometimes the system crashes within a few seconds of the test, and it very rarely goes to the end. But when I run this test separately from the rest of the benchmark, it can run for hours in a loop. Graphic stress test always passes without problems with 99% + framerate stability
update:
My rebooted twice without blue screen, when i tried to launch amd settings from desktop. After which I disabled the driver-verifier, and decided to run logitech gaming software, and then BSOD happened
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
windbg refers to the HIDCLASS.SYS driver
If you upload that minidump I will look for the failed driver.
If you want to try since it's useful to learn.
Open the dump file with windbg
Type !thread into the command line.
About 10 lines from the bottom you will see base and limit values
Type
dps limit base
Replacing limit and base with the hex numbers.
This opens up the stack recorded during the crash.
Scroll through the list looking for any drivers that failed.
I think this crash was a coincidence and related to Logitech software. My system almost never shows BSOD. Today I have installed 20.Q3 driver for radeon pro and today my system was stable. I will continue to watch.
Thank you. I am in the same boat, and I will do as you pointed to see what the mini dump will show. I tried everything also, undervolted the GPU, clean installation, ran with A-XMP disabled, manual timing, and whatever was possible. My setup is R5 3600, Sapphire RX 5700 XT Pulse and MSI B550 Gaming Edge Wifi. I ran with two M.2 drives, so no SATA cables or DVD drives (a post mentioned that SATA cables could be the problem too).
I followed your instructions, but I didn't find anything, besides a lot of question marks in several lines, and in other lines I believe are commands. I'm sorry, I may be saying something wrong.
Unfortunately quite common with 0x124 errors.
Did you try using verifier to catch any driver issues before they crashed the system?
I haven't, but I can definitely try it.
Do read the instructions on how to turn it off.
Don't want to get caught in a boot loop if it finds a driver error during boot.
It crashed in a few seconds. What should I do next? Thank you for the instructions
Did it create a minidmp in C:\Windows\Minidump
Nope, it didn't. I did not get a blue screen, it just rebooted every time (I had to restore to a earlier point because the other options to disable verifier did not work). And I rebooted at the stage where the video driver should load (I have almost nothing running on this PC, only Logitech G Hub, AMD Adrenalin and Windows "stuff" that comes with the installation).