Hi, I've been struggling with this issue for a couple weeks now, with about one BSOD a day. It's usually the same BSOD, or a game will crash to a black screen before coming back with visual glitches in the corner and clusters of offcolor pixels. The GPU usage lights on my Vega Card will go dead to show that the card shut down. I have a couple dmp logs, but only 2 full Memory ones saved because I didn't think to save those individually before. I'm just narrowing it down to a bad PSU or a bad GPU, and looking into RMAing my card soon because I'm pretty frustrated with it.
The card seems to crash under anything that ranges from light to heavy stress, but only in videogames like Apex Legends, Risk of Rain 2, or MTG Arena. It never crashes during extended stress tests with MSI Kombustor for 30min, or Valley Superposition. It was also fine during FireStrike.
The main handful of crashes were Thread Stuck in Device Driver, but I also got one Video TDR Failure and a system service exception when I swapped the sub card in without uninstalling the old drivers, but it's fine now.
What I've done to narrow it down:
-Disabled all CPU/RAM overclocks and put the GPU into the spare BIOS that has a power limit.
-Tried a spare card that's weaker and older (750 Ti) and repeated the same things that seemed to replicate the problem (Loading screens of some games)
-Ran Memtest86 and Prime95 for hours each, with no errors. Also tested GPU memory using MemtestCL and had 0 errors.
I have the dmps available to post but after changing to my old card, it appears to just say dxgkernel by default instead of the atikmpag.sys and atikmdag.sys it said for the modules before. The error codes might still be intact, however.
Forgot to add, I've reinstalled the drivers more than once and used DDU in safe mode to do so. I also tried using older driver versions or just not using AMD settings and letting it run on the Adrenalin Drivers from Device Manager updates. I reset windows with a fresh install from the reset option, and installed the drivers again, but still crashed. What confuses me the most is not crashing in stress tests but during things that aren't as intensive.
Check out your windows update history and make sure you aren't missing and .net runtime updates ( either failed installs or pending ).
These usually take multiple restarts when installing.
It's possible that this might be the case.
Otherwise it sounds like the card is overheating or some other hardware issue where your might want to get an RMA going because that's just not normal.
I see. I looked into update history and one Windows Cumulative failed but the newer version of it finished, and after checking for updates and finding no new ones it doesn't appear to need anything. If it's heat or hardware, could it have developed suddenly?
Could be just higher temps because there's more and more summer heat in my area? I don't see any of them going past 90C (at max, it's about 82C Core, 85C Mem), except for the hot spot to 100C.
Since it was fine before, I assumed it was software related because of the specific BSOD I was getting. I'm looking into an RMA with it set up, I just wanted to exhaust all my options before I sent it in, thank you.
That's pretty hot. On the high side of hot and your vrms are probably worse than that, so you may be getting the instability there as well.
Ah, I see. I guess I didn't consider it before because I thought it had to exceed 85C (what people said was fine) on sensors, but I forgot to account for the hotter parts. So far I've improved the ventilation in the small room, and haven't had any issues, so hopefully that was it. Thank you!
I've done a great deal trying to get my Asus strix vega 64 under thermal control ... I'm thinking it may be the thermal pads for the vrms. ( I've read about this card ) and have had the black screen problems...
So I started using radeon chill feature and kept the drivers up to date and this seems to have temporarily suspended the thermal issues.
I'm still looking at high temperatures on the soc vrms and the memory vrms. Hotspot reached 90c during a casual 1080p firestike normal custom windowed run.
...just ridiculous. Pads in the mail and the card's getting a teardown this weekend.
I've only heard a couple things about the Strix Vegas, so I hope it works out for you with those pads. I'm using an MSI Airboost 56, so it's loud but the cooling hasn't been too bad since I have an aggressive fan curve. Until summer started, at least, so far keeping the door open helps a ton.
Ive just bought a new MSI Radeon RX Vega 56 Air Boost 8GB OC HBM2 Graphics Card for my game streaming on twitch to replace my RX 580 and as soon as Radeon Adrenaline picked it up it crashed with Atikmdag.sys as the cause in the BSOD screen. i reinstalled a fresh copy of Windows 10 and added all the drivers ok and then attempted on the latest drivers and the same happened.
Tried again with driver check turned off, same. Tried in safemode, same. Went to bed.
5.30am this morning moved the card from the top slot in my Tomahawk MBD to the bottom slot and now works fine. Rechecked with a GPU checker and runs like a dream. Dont know if this will fix anyone elses problems but hope it does.