A couple of users and myself have been suffering sudden reboots with our computers composed of Ryzen CPU systems (Ryzen 3000, but especially 5000) under different load conditions. The quickest way for us to trigger it, however, has been by using software designed to test RAM stability such as TM5 or Karhu RAM Test.
We have recently discovered that this problem only occurs if we have HWiNFO loaded in the background on Windows 10. Most of us also have AMD Radeon graphics cards, but we yet have to determine if that is a contributing factor. We don't know exactly where the conflict is, but the pattern is clear: we see the dreaded WHEA-Logger Event ID XX Cache Hierarchy error in the Event Viewer of Windows after those sudden reboots.
This has been tested by multiple users across different setups: motherboard manufacturers, AGESA/BIOS revisions, RAM brands and configurations, settings, and even after a fresh install of Windows (including different versions of the operating system). The only common denominator we have been able to find this far is the use of HWiNFO (we've only tested this using the latest versions - we still don't know if it can be solved by rolling back to a previous version specifically).
I'm sharing this information here with the hope that this problem can be reproduced and fixed accordingly. Perhaps, it will also require collaboration with the team behind HWiNFO. For that reason, I have also created another thread in their support forum: https://www.hwinfo.com/forum/threads/is-hwinfo-causing-the-dreaded-whea-logger-event-id-xx-cache-hie...
If anyone else is suffering from this problem an can reproduce it, please chime in and let us know. The more feedback and information we can gather, the better.
Thank you all for your time.
Well, take a look at the link to the HWiNFO support forums I shared in OP, where a group of users and myself are looking for the issue with the creator of the tool. So far it seems to be related to the GPU sensors of our graphics cards (Navi 21) and we're currently testing a new BETA.
Anyone interested, please take a look here: https://www.hwinfo.com/forum/threads/is-hwinfo-causing-the-dreaded-whea-logger-event-id-xx-cache-hie...
I have Asus Dark Hero motherboard with Ryzen 5 3600, GSKILL 4x16GB@3600 and RX5700XT and this happened on the very first boot the PC feeezed and I have to unplug it in order to get it to work. Once I've installed windows I get some random WHEA errors while PC is in idle.
I've decided to update to latest BIOS and intermediately after BIOS flash was finished the PC was restarted automatically few times and after that the screen went black and the MB gives me 00 error code which means CPU error according to Asus web...
For now I,ve set PBO to defaults and RAM voltage to Auto instead 1.35 which is the default with XMP loaded to 3600Mhz
I don't use hwinfo, I don't even use a radeon vcard and I continue to have bsod with whea errors, all the people I have seen with these problems have different memory, mobo, bios version programs, psu, etc, the only common factor is a series processor 5000x...
I have a 1070Ti and I see it.
I also have a 3900XT and I see it.
It's not just AMD GPU's and 5000 series only. What I did not know is that HWInfo can cause it.
I'll be uninstalling that to find out.
I've compiled together a very complete post along with troubleshooting efforts and steps. It's got an Nvidia card in it.
It's not just AMD GPU.
I have a 1070Ti and a 3900XT and I see it.
I rolled back my bios to an earlier time when I KNOW I didn't have this happening.
Only 2 things are true in my case:
1) It can be a BIOS update
2) It might be HWINFO (some say AMD cards nope I have Nvidia)
I am beginning to think that this is the case. I am on a 5800x and 1080ti and I am getting random OS lockups/freezes and it seems to only happen if I have HWINFO or HMonitor open for long periods of time.
I went 36 hours with no crashes, then it crashes about 30m-60m after opening HWmonitor, then later that day I installed HWINFO to setup rainmeter and look at things in more detail to try to troubleshoot the crash, and I had multiple crashes in a 4 hour period. Seems a bit to coincidental to me.
Now sitting at 21 hours, without hmonitor or hwinfo open, we'll see if it lasts.
I also think it was worse on the AGESA 22.214.171.124 beta bios that is available for my motherboard, as opposed to the last stable BIOS release that is AGESA 126.96.36.199., although it happens on both.
The only difference for me is I don't see the WHEA logger cache hierarchy error that often (have seen it a couple times), usually it just locks up, I power cycle, and the only critical/error events in the log is a generic kernel power 41 (unexpected shutdown). I also have not seen the problem occur under load, seems to only happen at idle or during low work loads like browsing or watching youtube
"I also have not seen the problem occur under load, seems to only happen at idle or during low work loads like browsing or watching youtube"
Thats when I see it also.
Im at the last part of a very long troubleshooting effort that has also been expensive.
Its either the CPU (doubtful now), HWinfo orBIOS. I rolled back my BIOS to the time my system was stable. If it gives the hiearchy cache error again it will be unistalling hwinfo.
I have an Nvidia card (1070Ti) and I see it happen with my 3900XT.
Same I've tried everything I can think of/find and there is no reason to suspect the other components in my system, they are high quality components I've been using in my previous gaming rig (completely stable intel based system) and they don't just suddenly become unstable the day I swap the board/proc/ram. I am actually on my second ram kit too (returned intel optimized kit for amd optimized kit) with no change to the problem so I know for sure it's not the ram.
I do believe at least in my case it is bios/agesa related. I know some people who see the WHEA 18 error a lot had success RMA-ing their processor. Given my below results, at least in my case, I don't think it's a hardware issue, patiently waiting for gigabyte to release a stable bios on latest agesa. In my opinion if it was bad board or cpu the instability would present under load. This is more like a voltage/power control problem at low load which screams BIOS to me.
Anecdotally, I am now sitting at 42 hours uptime with no crashes, without hwmonitor/hwinfo running. Varied use mostly idle/low load and some gaming.
I am RMA'ing the processor.
I did roll back to BIOS 3602 and a problem that I thought was gone has returned.
I've always randomly gotten a black screen and the TV would reboot. I thought it was my video card 1070Ti and it is not.
THis time I fished out of the event viewer the dreaded WHEA event. So, for me it's RMA time I'm convinced now that is the processor. I gave it a real try hoping it was BIOS and it yet may have contributed but I doubt it now.
Here's my updated post on the matter. I'm calling AMD monday to intiate RMA. There's not much they can do at this point to help try to resolve it. LOL!!!!
I didn't even have HWiNFO and I got those reboots happening randomly once every couple of days. I have Ryzen 1700X. I also seen BIOS code 33 (yes, the cache problem) flashing for a split of a second on my mobo LED display after that reboot. It might be Windows 11 preview instability, but it also might be my Ryzen overheating, because core temp (BIOS LED) is showing like 70° under moderate load, so probably I should reapply the thermal paste.