I really hope somebody can help me with this. I've had this computer since I built it in 2019, and until very recently, it's run fine through (literally) thousands of hours of gaming and general use. Within the last month, the GPU has begun acting funny.
First, the system:
i7-8700K (factory setting, no overclocking, ever)
Vega 64 (reference card version)-factory setting, no overclocking
ASUS Maximus X Hero MOBO
Corsair HXI 750 PSU
Windows 10
GSkills RAM 2x8GB
Corsair H151 CPU Cooler
Display Port Cable to Monitor
Dual Cables to PSU (Not the Y-splitter)
This is a randomly occurring problem. Not very frequent, although once or twice a night is not out of the ordinary.
I can play HOURS of gaming (7 Days to Die) with absolutely no problem.
However, AFTER playing, and shutting the game down, and returning to desktop, I'll step away (leaving the computer on idle) while grabbing a snack, and either while I'm gone, or shortly after I return (not using the computer, just sitting there next to it watching TV), the screen will suddenly display a NO SIGNAL message. At that point the rest of the computer still seems to be running, but there is no signal being fed to the monitor. Only way out of this is to literally flip the PSU power switch and force a full reboot.
Computer comes back like nothing was wrong. No error messages. No Safe mode. Just good to go.
At the time this is happening, you can hear the GPU trying to kick it's fan on, and then off, over and over. And if you look AT the GPU, in sync with the sound of it attempting to run the fan, the RADEON LED logo blinks off...at the same time, the single red tach light on the card shuts off, and a green light flashes for about a half second, in sync with the RADEON LED going off. Then the RADEON LED goes back on, and the green light goes off and the first red tach light goes back on. This cycles, over and over, at about a 20-30 second rate. Hand to the rear exhaust port feels much hotter than normal air. E.g. Whatever is going on, it's not running it's blower fan, until I do a reboot.
I have checked:
-Power. It's not a power issue. I have a usage meter that is showing full system power usage (GPU and EVERYTHING ELSE) and that never goes above 450w. PSU is 750w.
-Thermal. It's not a thermal issue...unless, inside the VEGA 64 it is getting hot somewhere that the thermal sensors don't pick it up. GPU during gaming never gets above 70c, mostly in the 60's. No warnings, no thermal throttling, etc. Things are COOL.
I've swapped the PCI-e power cables for fresh ones from the Corsair box. Problem persists.
It's not the DP cable---when this problem occurs and I reboot, I can swap that cable over to the MOBO video DP output, and it will work fine the rest of the night.
The VEGA 64 fan DOES work. Just ran at 40-60% RPM speed while I gamed for the last two hours. It only stops working when the card stops sending a signal.
WINDOWS is set to NEVER screen save or power save: Screen ALWAYS on. Verified NOW.
Things I'm down to/suspect/have tried (and here's where I hope somebody can be a hero):
1) Drivers. Yes, I've uninstalled, done the whole DDU thing. I've used current drivers, as well as drivers from a year ago that worked for over a year with no problems. I know WINDOWS 10 had an update in January (cannot UNINSTALL it, there's no option to do so). I know sometimes WINDOWS stomps on drivers like it's mad at them, so maybe a conflict there?
2) Radeon Adrenaline. Deleted. Reinstalled. When I see the problem, the best way I've found to get close to normal function back is to do this re-install, and force the Factory Reset. Related, in the last three days, each time I rebooted and tried to open Adrenaline, it tells me it's not compatible with the AMD drivers (or something like that), so I delete and reinstall/factory reset AGAIN, just to make it work. (THIS, I think, is because WINDOWS is trying to force a new driver on top of what I want to use from AMD. Tonight I finally got off my *ss and disabled WINDOWS from updating GPU drivers. Ever. We'll see, going forward if Adrenaline stops complaining, but that's just a side issue. Possibly a CLUE that it's driver related?)
3) When I run CPU/HWID, I can see that the GRAPHICS CLOCK in the VEGA 64 is showing as 26MHz when at idle (when I'm not doing anything)....or even doing only a little,....like RIGHT NOW. I'm not an expert on the clocks and the States that you see in the Adrenaline software if you Enable GPU tuning, but I can see that State 0 is listed as 852MHz. I would think that means that at the lowest state, 852MHz should be the MINIMUM clock speed you see. Looking at HWID, the clock speed is SLOWLY bouncing from 26MHz up to 851MHz. Sometimes directly, sometimes jumping partially, into the 500MHz range, sometimes other places. No clue why---BUT I THINK THIS IS A BIG CLUE.
4) Thermal paste and a hidden overheat? I don't want to open this card up. I don't. I don't. No, I really don't. But if somebody that knows says that if I open and re-paste, or if I take it to a tech shop and they open it and can fix this with something like replacing a shorted part, then so be it. But before we go to opening this up, I'm really hoping this is NOT the answer. But I need to know.
I want to know what's going on. Is this a piece of the hardware (capacitors, HBM, wire etching) slowly failing? Is this a bad driver that maybe AMD can update and fix this? I've NEVER had this problem prior to a month or two ago, where it will completely lose signal at idle. I had (previously) experienced several reboots DURING heavy gaming that I attributed to a possible thermal/throttle/reboot type issue. At that point I enabled Adrenaline custom fan curves, raised them, and that pretty much fixed that issue. I think.
The fact that I can run for hours of gaming and the VEGA 64 works, does not overheat, and does not need more power than it's getting, makes me hopeful it's a driver/software/compatability issue. But the 26MHz clock thing is just waving in my face like a red cape in front of a bull.
As I said above, I built this computer. (After a year of planning, so I wouldn't screw it up.) I know the basics. I'm not a repair dude pro. I can plug things in, unplugged them, and not get shocked in the process. (Usually.) I know how to access the registry, but sweat bullets every time I think about doing so.
If anybody knows what is causing this, please tell me. Be technical if you have details. Go nuts.
If anybody has seen this EXACT issue, and resolved it,...please tell me how.
I do NOT have a spare computer for testing.
I do NOT have a spare GPU.
I saved up for five years to build this one, and was counting on the VEGA 64 to last me at least 5 years.
Hopefully, one of you can be my hero.
(And yes, AMD, if you want to help a guy on a fixed income out, I'd be more than happy to take a NEW VEGA 64, swap it in, see if it solves the problem, THEM send you the old one. NOT expecting that, but I'm close to crying some nights, and if you contacted me to send me a replacement,...well,...me love you long time, soldier.)
Thank you for your help. Even just if you read this and sympathized.