By now I´ m out of ideas what to try.
Asus Strix Vega 64
Intel i5 4570
Gigabyte Z87M-D3H with latest BIOS
32 GB DDR3
Corsair RM650 PSU
Currently running a fully updated Windows 10 and I constantly get either a BSOD with THREAD_STUCK_IN_DEVICE_DRIVER or a crash to desktop with a AMD Driver Timeout, it depends on the driver version which of these two will happen. Oldest driver I tested is 19.12.1, newest one is the current 21.4.1 with several in between tested as well.
When switching driver versions I always uninstalled them, switched to Safe Mode and ran DDU
Games crash anywhere from within a few minutes to well over an hour into gameplay and it isn´ t tied to anything that would be reproducable, it´ s completely random.
Weirdly enough there is a fairly high chance of a crash happening when a Youtube video is playing and I open the Steam client.
Other GPU intensive tasks like upscaling video files run completely stable.
Before switching back to Windows about 2 Months ago I used the card for over 2 years on Manjaro Linux with the Open Source Mesa driver and never had any stability issues or driver related crashes.
What I tried so far aside from various driver versions:
-Disabled all onboard features I don´ t use (as in Intel GPU and onboard Audio are disabled in the BIOS and drivers for these things where never installed)
-Disabled both the Radeon and Steam overlays
-Tested RAM both with the Windows Memory test feature and MemTest from hcidesign, both couldn´ t find any faults in multiple runs, since I had to run 16 instances of MemTest to test all 32 Gigs this also had the CPU operating at 100% load for about 10 hours, because of this I would say that the CPU runs stable under load
-Set PCI Express Link State Power Management to off in the advanced power settings, this was recommended over on the Windows forums for AMD based Laptops and I feel that it did indeed make it a little bit more stable
-ran a 45 Minute Furmark stress test without issues or errors, however I´ m running it again as I´ m typing and plan to run a longer session this time
-tested games using that use DirectX, OpenGL and Vulkan, all API´ s suffer from the same issue
-installed a fresh Windows 10 with a up to date installation media because the one I originally used for my Windows installation was several years old
Anything else I could try or do to get this thing running stable?
Kind of an Update to this:
My second Furmark Stresstest ran perfectly fine for over 90 Minutes, no issues.
I tested a couple of Driver versions with the Driver only option during install, completely avoiding the Radeon Software, this however didn´ t improve anything.
I also disabled Fastboot in the Windows options, I found that as a recommendation in a older thread in the AMD Forums. Didn´ t help either.
Since I´ m running mismatched RAM (16 Gigs Corsair Vengeance, 16 Gigs Crucial something, both rated at 1600) I ran the System with each manufacturers modules only to make sure that the System isn´ t overly sensitive to mismatched modules for some reason. Didn´ t help either.
So far I somewhat ruled out overheating as an issue because I run the system with open Sidepanel anyway because my case has a very limited air intake area, however I decided to have a small room fan blowing air onto the GPU from below it and partially into the RAM / CPU area. So far this did help to keep games from crashing, I have a overheating issue for sure. Both the CPU and GPU heatsinks are clean. Guess I have to investigate this, might have to check the Thermal Paste on the CPU.
Is it a expected behaviour of the AMD driver to crash in case of overheating? I always assumed that the system would either freeze or shut off entirely.
However I still have a issue where the AMD driver has a unusually high chance of crashing when switching resolutions when a game is in Fullscreen mode.