I just wiped my whole machine. Fresh start with Windows. The issue still persists. Power plan is defaulted to AMD Ryzen Balanced. Nothing tweaked yet like unpark cpu cores, or optimize Windows for gaming.
Steps I took so far to rebuild machine:
-Click on check for updates until it stopped downloading anything
-install chipset driver
-install adrenaline driver.
-Create System Restore point
-Download Chrome with DuckDuckGo extension
-Download UBISOFT connect , Download Assassins Creed Vahalla.
-Loaded a save game where I know the I could reproduce the issue. Issue occurred within 2 minutes.
More than happy to entertain suggestions at this point: Discord id: Spladian#9998.
Windows: Windows Home Fresh install as of 12/30/2020.
Mobo: Gigabyte Aorus Extreme 570 with latest bios update
GFX : 6900xt Power Color Red Devil
Ram: DDR4000 CL18 - Trident Z Neo - replaced it with
Ram: DDR3800 CL14 - Trident Z Neo (approved ram for motherboard) - This ram is 2 days old
PSU - EVGA SuperNova 1200W Platinum
So, I ended up disabling Fast Boot, and also manually bumping up the GPU fan curve in Adrenaline.
I got the feeling that my GPU was a little "too" quiet. After doing this, I was able to play all night without a crash. *Knocks on wood*
Try cranking the fan curve up and seeing if that helps.
I'm starting to wonder if the WHEA errors showing in HWINFO 64 are actually telling of anything.
If I cold boot the system, they disappear. If I restart the system, they appear at 1 every two seconds on the desktop. Once in game, they level out to about 1 a minute. over the course of an hour of play. I've not touched anything in the Radeon software except for enabling Adaptive Sync.
The default profile runs at:
1300 fan speed
69c Temp, 80c Junction Temp.
Everything goes fine until I start manually playing around with these values and then errors become much more frequent.
The hard pill to swallow here is - Once I click any of the buttons on the tuning page, the issues begin. If I go back to default values, the issues then don't go away, but persist. This was before reformatting Windows, so I'll play a day to see if things are mostly stable now, and report back if any errors after manually setting the fan curve.
Appreciate all the suggestions.
AMD RDNA and now RDNA2 are known for having random high voltage spikes as well. Often times going beyond what even a quality PSU that is should be enough according to recommendations can handle. You can run a monitor with logging and see if you are getting a voltage drop at the time when problems happen.
You can test the GPU and PSU also with OCCT from ocbase.com and see if you can force the issue.
Whea errors only show up when I restart windows, but when cold boot, they are not there. The polling rate of HWINFO is 2000ms, which is probably why the errors were happening 1 every 2 seconds.
I played some games today - and things were stable. 3 hours with no crashes. I'll chalk it up to maybe a registry issue, or driver issue as windows was on my machine for 2 years.
Thanks all for trying to assist. Starting from the beginning for any troubleshooting is always the best. Windows installs are no different, but is mostly avoided because of the hassle of rebuilding everything.
Shame on me for not being intuitive on this, but I'm trying to understand what your "solution" was for your issue. I'm having similar crashes with a 5700 XT, so I'm just going across threads looking for any info. Was the full windows wipe what helped you? And if so, did you keep ANY files or did you full wipe all files?
Sorry for not understanding what you did here to stabilize your system.
Hopefully I can add to this conversation as I was having system crashes (Total system lockup with video frozen and hard reset to resolve) after moving from a nvidia 2080 to an AMD 6900xt.
Crashing as well in Call of Duty Modern Warfare and later in a web browser... hmm
setup: cpu i9-9900k / 64gb, asrock z390 taichi ultimate mb, a few nvme drives.
What I found so far:
I was crashing in games like Modern Warfare from sitting idle in the lobby after awhile. Thinking this was due to overclocking, I went back to the default AMD settings which resulted in additional system crashes. hmm okay.. now why am I crashing even more now???
Here is what I think happened:
Modern Warfare most likely has a memory leak and/or is requesting too much vram. (More on that in a sec). Thinking the crashes were a result of overclocking I moved to the default AMD settings which unfortunately sets the fan curve very low. This allows for the junction and core temps to rise. While okay for the video card.. this is throwing some serious heat off the metal backplate on the GPU. okay and..
With my particular motherboard... the GPU sits very close to the System Memory and M2 slots for storage. I started seeing some heat warnings on hwinfo64 on my nvme.. wtf. So I ended up moving the GPU down to the secondary PCI 16 slot to give some distance from the memory/etc. Then went back into the GPU settings and cranked the fans up 100% and removed that zero fan.
Back to Modern warefare.. I put a cap on video ram usage. Went into documents/call of duty modern warfare/players/adv_options.ini and changed video memory scale = 0.5 for now.
So now the game is pulling 9gb of vram. So we'll watch that and restart the game if she goes above 10gb.
NOTE : I'm still trying to setup baseline testing but I was able to at least get a few hours of game play without issues. No more system crashes while at the desktop doing nothing. So we're making progress I guess.
I'm trying to narrow down what appears to be the Radeon application crashes in the event logs. What I'm noticing now is windows error reporting when I come back to the PC and turn on my monitors. I do not run any type of power management and my PC runs all day (Never turn this thing off unless patching). So I'm wondering if something may be happening with display detection but I'm not there yet in my testing.
Triple Monitor setup (2k monitors LG-850b) : 2 Monitors run Display Port and 3rd monitor is running Type-C to DP cable.
NOTE2: Usually on hard system crashes which require a power off.. I'm stuck having to pull that type-c cable out and plug it back in to get video back.
I'd like to hear if anyone else has luck moving pci 16 slots and running fans 100%. etc..
Edit Note: Running 20.12.2. software
Decided to try an older game like BattleFront II.
Crash about every hour. Hmm..
Did a fresh install of drivers with DDU and boot to safe mode.
Going to give this another run.
sfc /scannow shows no issues.
Swapped out ram so I was running 32 out of 64. Flipped sticks.. same issues with crashing.
Running no OC's on video. We'll see what happens after this.