Here's a description and then a timeline of my issue:
My PC started having issues with crashing while idling (screen turned off). I don't remember when I updated my GPU drivers exactly, but I do remember updating it to Adrenalin Edition 24.6.1 not too long before. However, it was fine for at least a few days before the issues started popping up and then devolved into OS instability.
Specs:
CPU: AMD Ryzen Threadripper 2950X
M/B: ASUS ROG Zenith Extreme Alpha, BIOS 2701
GPU: AMD Radeon RX 6900 XT
Latest GPU Driver Version: 24.7.1
07/15:
GPU driver is 24.6.1, BIOS is 2101. The monitors are black, and PC seems to have crashed while idling overnight (I routinely leave the computer on overnight to idle after locking it). Jiggling the mouse or typing on the keyboard doesn't wake it up. This is the first day it starts to happen. Turning the PC back on, it initializes to POST code 97, with text "Test NVRAM". It remains stuck here unless I shut it down. Decide to shut it down, turn off PSU, turn on the PSU, and try again. This time it boots.
07/16 - 07/18:
Issue persists, but degrades from only occurring overnight to happening if left for a few hours. I suspect a PSU failure and contact my PSU manufacturer about a possible replacement. I still haven't received an update at the time of this posting. I can still play video games and do intensive work, but if I leave the computer without inputting anything for 1-2+ hours, it seems to crash, and it doesn't output any video. I read some posts about USB Selective Suspend and think that might be an issue, so I go into my power settings to enable it. This does not resolve the issue.
07/19:
I read some posts and decide that maybe the BIOS is too old or out of date, even though this issue did not persist for the past year or so when I have the current BIOS. Still, no harm in updating it, so I download and upgrade the BIOS from 2202 -> 2701. The problem appears to go away as the PC did not crash the next morning, my RAM is able to boost to 3000 MHz (its designated spec) for the first time in years, despite not being able to before. However, I quickly change it in the following days back to 2800 MHz (which is what it normally is) following these crashes.
07/20:
I jiggle the mouse, and the PC does wake up, only to crash in the time it takes for it to idle. I come back to blank monitors, and I power cycle the PSU. I get some new POST codes when I try to boot it without power cycling the PSU first, like 92, and 94. This makes me think that the GPU driver might be out of date, so after turning it on I update the driver to 24.7.1. I do this by having the AMD Software reset to factory settings (checking the box) and the installing it, after which I reboot the PC at the prompt. This doesn't fix the issue. I've updated my drivers this way for a long time and I haven't had this issue before, but maybe this is causing them to be corrupted.
07/21 - 07/23:
Issue persists, but now the PC crashes after leaving it for 20-30 minutes on idle. I check the power settings and it just seemed to crash whenever the monitor turns off, even though I set the PC to not turn off (and this issue has never happened to me before). Power cycling the PSU turns out to not always solve the problem, as sometimes the PC boots into POST code 92. As a result, to be safe I enable hibernation and put the PC in hiberate before I go to sleep. This seems to work around the problem.
07/24:
I run complete Windows Defender scan on my drives. My PC is able to complete this successfully over the course of 4 hours without issue, and detected no problems. It had no issues and I thought
07/25 (Today):
The PC has BSODed multiple times. At the start of the day, when I tried to wake it up from hiberation, the PC appeared to wake up, but nothing showed up, demonstrating that it crashed while hibernating (that is my guess). Checking event viewer, it appears to have "shut down unexpectedly at 03:41". After that, I turn it on again and it worked normally, until it stopped opening applications. As in, I would click on them, and nothing would happen. I check Task Manager and I watch as Discord simply hangs in the background, while my web browser and other applications start up and shut down immediately. Here are the BSOD errors, courtesy of WhoCrashed (free software for reading dump files):
All of these were minidumps, except number 5, which was a full memory dump. I have tried using "dism /online /cleanup-image /restore-health" and "sfc /scannow" as an administrator, but the former just gives me error code 87 and the latter will reach 100%, then say that it cannot complete the requested task. I've tried turning off D.O.C.P. memory overclock and downclocking my RAM to 2800 MHz, but that has proved unstable as well. My CPU has not been overclocked for a few years, though this was not due to stability issues and due to performance in some games dropping because I had it locked at 4100 MHz but benefitted from single cores boosting higher.
Update: While writing this, I decided to downclock my RAM to 2400 MHz, and running an "sfc /scannow" scan did not seem to find any integrity violations. I still have D.O.C.P memory overclocking disabled. It has not crashed, and I will update if there are further issues. Updating the BIOS may have caused an issue with the RAM stability, but I'm not sure, though this happened in the past when I first built the PC in 2020. I also disabled CSM in my BIOS settings, since I don't remember enabling it and Windows 10 does not require it being enabled to boot into it. Disabling CSM did not resolve my stability issues, but I wanted to mention changes I made to the BIOS.
Solved! Go to Solution.
It's been a week, and there haven't been any issues. I've left my computer on running continuously with these settings:
2133 MHz RAM
AMD SAM Enabled (Disabling CSM and Enabling Resizable BAR in BIOS)
Disabled USB Selective Suspend in Windows
Disabled Fast Boot in BIOS
AMD Software: Adrenalin Edition 24.7.1
Driver Version: 24.10.29.01-240714a-405203C-AMD-Software-Adrenalin-Edition
Every time I leave it on overnight, my PC is able to wake up from idling without issue. It appears that the issue really was the Nvidia driver that had been leftover, and once it was removed there have been no other issues with my PC shutting down after being left to idle (so that shutoff I woke up to a week ago was in fact an issue with the power grid and not the PSU or drivers). In fact, it's been on for a week straight without problems, just like it used to be.
I never got BSOD, but other driver crashes I get all the time with any of the 2024 drivers.
I think the aggressive power management for idle causes these issues. Especially if you use HDR.
The only viable solution is to use 23.12.1 drivers for me.
07/26 Update:
I reset the CMOS on my motherboard yesterday, so that the PC ran at default settings; no memory speed faster than 2133 MHz, no CPU overclock. I then used AMD Cleanup Utility to remove all the drivers, and redownloaded AMD Adrenalin Edition to install 27.4.1. No BSODs or anything like that. However, when I left the PC locked and idle overnight (intentionally to test), I woke up to a screen that received no display signal when I tried to wake it up, just like when the problem started. I turn off the PC, and then turned it back on without resetting the PSU. The PC boots just fine and gets past POST, but there is no display out to the monitor. My GPU is confirmed working as the fans are spinning. I've tried unplugging and replugging the HDMI and DP cables to my monitors, but that doesn't resolve the issue. I don't know what else to try or what else might be the problem.
I want to add that when I boot the PC, since 2701 there would be a brief flash of the ROG logo when the PC turns on, plus the phrase "Press DEL or F2 to initialize boot menu" or something like that if the GPU has an output, but it doesn't do that. Furthermore, when I disconnect a peripheral (like a keyboard or mouse) and plug it back in, the RGB lights do not come back on after some time (though if I do it shortly after the mobo stops displaying POST codes, they still turn on, indicating that the motherboard is able to detect them and send power to them). When my PC boots, it does all the checks, and out of the codes it sticks to B4 briefly before going to 92 during the VGA checks, and then A2 is the last code when checking the HDD/SSD before the PC (apparently) boots into Windows.
I turned off the PSU and turned it back on, and my GPU finally has a display out to the monitor.
Have you web searched the 'Test NVRAM', maybe also post a query on the asus rog forum?
I have searched online for the "Test NVRAM" issue, however the root cause of this message appear to be inconsistent. They've been everything from CPU, GPU, failures, Windows settings, and issues with drivers. My CPU, though old, has not had any issues up until this point. My GPU is 2 years old, and I've never had any issue with it. All my hardware was bought new, so I know how they usually work. My RAM is technically 2 sets of 2x 8 GB DDR4-3000 G. Skill Trident Z Royal, which haven't given me any issues before (other than not being stable above 2800 MHz). However, the issue persists even with a RAM speed of 2133 MHz, so I'm not sure if the RAM is the issue.
My googling of asus and post code 97 makes it sound like that their is an issue with gpu. One person said hardware issue with GPU, and RMA replacement fixed it.
As far as sfc and dism goes, I always do a chkdsk with repair for any issues before those commands. Then run sfc and dism. If disk is messed up, the repair can still have issues.
I prefer to disable USB selective suspend since it can cause other issues in my experience.
If i was in your situation, I would make sure windows drive and installation are fully repaired and in good shape. I would try old GPU. Also, make sure all connections to gpu are fully seated. It just hit me that I also recommend turning off fast start in bios and fast startup in windows. In my experience, these two things can hang onto old/wrong drivers.
When installing driver in your situation, download gpu driver i want. Disconnect from internet. Use DDU to clean all AMD gpu drivers. Install gpu driver. Restart. Reconnect to internet. The disconnect from internet ensures that windows doesn't try to install a gpu driver.
08/01 Response
Took your advice to run "chkdsk", which seemed fine for the most part until I got something which said "The Volume Bitmap is incorrect". So I did the suggested thing and ran "chkdsk /scan", and it came up with no issues. I'm not sure if chkdsk has some blindspots or if it really fixed the issue. I also did "sfc /scannow" which turned up nothing as well.
C:\WINDOWS\system32>chkdsk
The type of the file system is NTFS.
Volume label is Boot Drive.
WARNING! /F parameter not specified.
Running CHKDSK in read-only mode.
Stage 1: Examining basic file system structure ...
1668352 file records processed.
File verification completed.
Phase duration (File record verification): 9.50 seconds.
27979 large file records processed.
Phase duration (Orphan file record recovery): 0.00 milliseconds.
0 bad file records processed.
Phase duration (Bad file record checking): 1.13 milliseconds.
Stage 2: Examining file name linkage ...
3054 reparse records processed.
2489746 index entries processed.
Index verification completed.
Phase duration (Index verification): 34.75 seconds.
0 unindexed files scanned.
Phase duration (Orphan reconnection): 37.25 seconds.
0 unindexed files recovered to lost and found.
Phase duration (Orphan recovery to lost and found): 0.74 milliseconds.
3054 reparse records processed.
Phase duration (Reparse point and Object ID verification): 13.24 milliseconds.
Stage 3: Examining security descriptors ...
Security descriptor verification completed.
Phase duration (Security descriptor verification): 39.56 milliseconds.
410698 data files processed.
Phase duration (Data attribute verification): 0.67 milliseconds.
CHKDSK is verifying Usn Journal...
36488408 USN bytes processed.
Usn Journal verification completed.
Phase duration (USN journal verification): 90.02 milliseconds.
The Volume Bitmap is incorrect.
Windows has checked the file system and found problems.
Please run chkdsk /scan to find the problems and queue them for repair.
976100351 KB total disk space.
647505388 KB in 1155142 files.
880548 KB in 410699 indexes.
0 KB in bad sectors.
1809715 KB in use by the system.
65536 KB occupied by the log file.
325904700 KB available on disk.
4096 bytes in each allocation unit.
244025087 total allocation units on disk.
81476175 allocation units available on disk.
Total duration: 1.36 minutes (81674 ms).
C:\WINDOWS\system32>chkdsk /scan
The type of the file system is NTFS.
Volume label is Boot Drive.
Stage 1: Examining basic file system structure ...
1668352 file records processed.
File verification completed.
Phase duration (File record verification): 10.71 seconds.
27979 large file records processed.
Phase duration (Orphan file record recovery): 0.00 milliseconds.
0 bad file records processed.
Phase duration (Bad file record checking): 0.88 milliseconds.
Stage 2: Examining file name linkage ...
3054 reparse records processed.
2489746 index entries processed.
Index verification completed.
Phase duration (Index verification): 33.76 seconds.
0 unindexed files scanned.
Phase duration (Orphan reconnection): 37.26 seconds.
0 unindexed files recovered to lost and found.
Phase duration (Orphan recovery to lost and found): 1.51 milliseconds.
3054 reparse records processed.
Phase duration (Reparse point and Object ID verification): 15.84 milliseconds.
Stage 3: Examining security descriptors ...
Security descriptor verification completed.
Phase duration (Security descriptor verification): 41.15 milliseconds.
410698 data files processed.
Phase duration (Data attribute verification): 0.88 milliseconds.
CHKDSK is verifying Usn Journal...
36569912 USN bytes processed.
Usn Journal verification completed.
Phase duration (USN journal verification): 94.89 milliseconds.
Windows has scanned the file system and found no problems.
No further action is required.
976100351 KB total disk space.
647677020 KB in 1155145 files.
880548 KB in 410699 indexes.
0 KB in bad sectors.
1809715 KB in use by the system.
65536 KB occupied by the log file.
325733068 KB available on disk.
4096 bytes in each allocation unit.
244025087 total allocation units on disk.
81433267 allocation units available on disk.
Total duration: 1.36 minutes (81925 ms).
Here are the results of the scan using chkdsk.
I want to also say that when I initially ran DISM it did nothing for me. It just gives me an error code when I try to run it. To be fair, I got it from some article that was probably out of date. So instead, I found a new article that gave me correct information.
C:\WINDOWS\system32>dism /online /cleanup-image /checkhealth
Deployment Image Servicing and Management tool
Version: 10.0.19041.3636
Image Version: 10.0.19045.4717
No component store corruption detected.
The operation completed successfully.
C:\WINDOWS\system32>dism /online /cleanup-image /scanhealth
Deployment Image Servicing and Management tool
Version: 10.0.19041.3636
Image Version: 10.0.19045.4717
[==========================100.0%==========================] No component store corruption detected.
The operation completed successfully.
After that, I did the scans, which turned up nothing as well.
I also turned off USB Selective Suspend in my current power profile, which is the default "High Performance" profile. I also checked the power setting Link State Power Management under PCI Express, which is set to off. I'm not sure if this might have something to do with it, and I don't remember touching it, so I left it as is, but I wanted to note its status for obvious reasons. I appreciate the help!
Edit:
I disabled Windows Fast Boot in my power settings, as well as turning off Fast Boot in my BIOS. After turning off Windows Fast Boot, I shut down the system. Then, I powered it on, went into Windows (because I didn't hit F2 hard enough I guess), restarted, and got into the BIOS. I disabled Fast Boot, as well as disabling CSM again to reenable Resizable BAR. When I saved the changes and exited the BIOS, the mobo did another POST, and when I got into Windows I got a message saying that the devices had changed or whatever and I had to restart. Before I restarted, however, I checked AMD Software to see if AMD SAM was "supported", and it was not. After the final restart, I checked AMD Software again and AMD SAM was indeed running. Now the test will be to see if the same self-shutdown while idling issue occurs.
Disabling both Windows Fast Boot and the BIOS Fast Boot did not negatively impact my startup times, since I have an SSD. I feel like it might have been a little faster, but it's not like I was timing the boots.
07/31 Update
A couple of things have happened over the past couple of days that I thought were interesting.
Computer seemed to have turned itself on for an update on 07/27 after I shut it down for the weekend on 07/26, and ran until it unexpectedly shutdown on 07/29 at 3:25 AM. I'm fairly certain that I turned it off before I left (a regular shutdown, not hibernate nor sleep), but if I actually didn't, this doesn't change much about the unusual self-shutdown behavior.
What is interesting to note is that today when it shutdown unexpectedly at 8:26 PM, I tried to reboot it by just pushing the power button twice, and both times got POST Code 92 with the same text "TEST NVRAM"; then I decided to simply reset the CMOS by pushing the button and that seemed to allow the computer to properly boot again. I will write further if simply doing a CMOS reset can be a temporary workaround for the issue so I don't have to power cycle the PSU every time.
I will note that this time, when I was sent into the BIOS (due to CMOS reset), I touched nothing. Previously on 07/25, I changed the CSM settings to be disabled in order to enable Re-sizable BAR (i.e. AMD SmartAccess Memory since I have an RX 6900 XT and a Ryzen Threadripper 2950X; I also forgot to mention that I disabled CSM to enable Re-sizable BAR in the original post). I am aware that officially one needs at least a Ryzen 5000 series CPU and Ryzen 6000 series GPU to enable it, though AMD Software seems to allow me to do this, assuming I set the BIOS settings properly. To be clear, AMD SAM is disabled because I did not touch anything in the BIOS for this CMOS reset. As far as I remember, I have not had any issues with SAM being enabled, though I don't remember the last time I actually checked to make sure it was enabled. I will report further to see if the problem persists.
Could it be that the cmos battery is dying?
Sorry for the late reply, but I have not tested the CMOS battery. My motherboard is 4 years old, which seems a little early for a dead CMOS battery but I could also be entirely wrong.
Getting a black screen when the PC is left to idle has become the norm when using the waste of money that is the 7900 XT AMD GPU and their drivers. It's been years and AMD is just ignoring it, more preocuppied with adding trinkets than reliability. You know when my display stays on and/or PC comes back from sleep all the time ... when it has the NVIDIA GPU instead...same BIOS config, huge user experience boost. So don't waste your time with CPU and batteries...it's the driver or the card
My GPU is an RX 6900 XT, which didn't have this issue for the 2+ years I've been using it. All the older drivers were fine, until 07/15 when it started having this behavior while using driver version 24.6.1. My card is still able to play games and function normally in every case unless left to idle for any amount of time.
Yes, the same issues come up with many users of the 7900XT. As you say, the card was good for 2 years so it's more than likely the driver that doesn't deliever. As previously mentioned the 23. series was a bit more stable. Like you, I got the expected performance in games but the lack of stability of the entire system just isn't worth it for my use case. AMD needs to test and validate their drivers more than open the computer, play a game, shut it down...
08/01 Response
Took your advice to run "chkdsk", which seemed fine for the most part until I got something which said "The Volume Bitmap is incorrect". So I did the suggested thing and ran "chkdsk /scan", and it came up with no issues. I'm not sure if chkdsk has some blindspots or if it really fixed the issue. I also did "sfc /scannow" which turned up nothing as well.
TL;DR:
I disabled Windows Fast Boot in my power settings, as well as turning off Fast Boot in my BIOS. After turning off Windows Fast Boot, I shut down the system. Then, I powered it on, went into Windows (because I didn't hit F2 hard enough I guess), restarted, and got into the BIOS. I disabled Fast Boot, as well as disabling CSM again to reenable Resizable BAR. When I saved the changes and exited the BIOS, the mobo did another POST, and when I got into Windows I got a message saying that the devices had changed or whatever and I had to restart. Before I restarted, however, I checked AMD Software to see if AMD SAM was "supported", and it was not. After the final restart, I checked AMD Software again and AMD SAM was indeed running. Now the test will be to see if the same self-shutdown while idling issue occurs.
Disabling both Windows Fast Boot and the BIOS Fast Boot did not negatively impact my startup times, since I have an SSD. I feel like it might have been a little faster, but it's not like I was timing the boots.
Computer had another unexpected shutdown, so I had to reset CMOS to get it to boot again. It gave POST Code 92 with "TEST NVRAM" message. I disabled Fast Boot in the BIOS again, as well as CSM, and enabled Resizable BAR. When I left the BIOS to restart the system, it booted into Windows without issue, and AMD SAM was activated. I highly doubt that AMD SAM being activated is part of the issue, but I can disable to see if it persists. When I have time, I'll use DDU to do a complete driver wipe and reinstall everything, though if the issue continues after that it could be something wrong with the 24.6.1 and 24.7.1 drivers.
Discovered some leftover Nvidia drivers in my system back from when I had Nvidia GPUs on the computer. Uninstalled them, left the PC on overnight. When I woke up, however, the PC was completely shut off. I'm not sure if this was because of the PC itself, or if there was an actual power delivery issue that caused the PC to be offline (this has happened before during stormy weather). Either way, the PC was completely off this time, as all the RGB lights were off except the motherboard accents, which are always on so long as the PSU is powered on. I press the power button, and it boots up without any issue (despite the power loss). Looking into the Event Viewer, I get the same vague error of an unexpected shutdown (Event ID 41). I'll also note that the PC did not crash at all yesterday, on 08/02, despite leaving it to idle for 3-4 hours at a time.
I will test this again tonight and see if the results improve (no unexpected shutdowns) or are replicated (it completely shuts itself off).
After sudden power loss, I always do the chkdsk, sfc, dism thing to make sure everything is still good because corruption can cause strange issues.
It's been a week, and there haven't been any issues. I've left my computer on running continuously with these settings:
2133 MHz RAM
AMD SAM Enabled (Disabling CSM and Enabling Resizable BAR in BIOS)
Disabled USB Selective Suspend in Windows
Disabled Fast Boot in BIOS
AMD Software: Adrenalin Edition 24.7.1
Driver Version: 24.10.29.01-240714a-405203C-AMD-Software-Adrenalin-Edition
Every time I leave it on overnight, my PC is able to wake up from idling without issue. It appears that the issue really was the Nvidia driver that had been leftover, and once it was removed there have been no other issues with my PC shutting down after being left to idle (so that shutoff I woke up to a week ago was in fact an issue with the power grid and not the PSU or drivers). In fact, it's been on for a week straight without problems, just like it used to be.