I posted this over on TomsHardware, but it's not had any traction yet. Here is the original post (includes some MemDump files as well, in a zip on dropbox): Extremely Peculiar Issue, GPU Kernel Crash, BSOD - GPUs - Windows 10
For those that don't want to follow through to the link, here is the entire content of the post, as well as one of the BugCheck analysis results:
I've been working on figuring out what on Earth is causing this, and have had little luck so far.
Let me start with my current system specs:
CPU: AMD FX8320E
Mobo: Asus Crosshair-V Formula Z
RAM: EVGA 2x8GB DDR3 2400
GPU: Asus Strix R9-Fury (16.7.3 Driver, not installed Beta driver yet)
Corsair H100i GTX Cooler
PSU: EVGA 1000P2
Storage: Samsung 850 Pro 500GB
2x Seagate Barracuda 2TB 7200 RPM
Asus PB287Q via DP Cable
The original rig was bought about a year ago, mid-September. The storage, cooler, GPU, and PSU are still original from that build. I have the original parts lying around still, from various swaps I've tried. I had 0 issues with system stability until around February 2016. 4K Gaming, while having some FPS issues in certain games, was flawless. Streaming, encoding, whatever I wanted to do, my rig was stable.
I believe it was around the 16.3.1 Driver release that my rig started becoming frequently unstable. Random rebooting, display cutting out, 'snow' on screen when in 4K (this slowly migrated over to 1080P as well). The odd part about the 'snow' was it was only visible locally. If I used TeamViewer to remote in, the snow wasn't visible. I fought this for a few months, replacing random items I felt may have been the culprit. DP Cable, RAM, CPU, and finally the Motherboard; all to no avail.
Performed a fresh reinstall of Win10, and I was good for about a week until these blips started cropping up again. Luckily I've not had the random reboot lately, but now I do get full-blown GPU Driver crashing (and recovering). The game that usually triggers this is Witcher 3, though I do get it when just browsing the web doing basically nothing; though it's not nearly as frequent as W3 triggering. And here is the really odd part. I can play W3 for 8+ hours perfectly fine some days, and crash after 20min another. Load it up again, 4-5 hours stable. It doesn't make any sense.
As of this morning, I've received 3 BSOD, IRQL LESS THAN OR NOT EQUAL TO codes. Once triggered by Magic Duels, twice triggered by Witcher 3. I've used BSV and VS2015 Debugging, but nothing has proved too fruitful in providing results. The best I've gotten so far is that something is freaking out ntoskrnl.sys, but I can't seem to locate what. I'm presuming it's still the AMD Driver, but I've not seen solid proof of it yet. I know the screen flickers, and I get the typical 'sound' my OS makes when the driver crashes and recovers, but now they are BSOD's instead of just dropping back to desktop.
I'm honestly suspecting there is something wrong with the GPU at this point, I just don't quite have another 500$+ to replace it yet (short of throwing it in someone else's rig at this point, and see if the behavior follows).
Here is a DropBox link to all 3 dmp files. Any help would be greatly appreciated. If there is any other info needed, let me know.
Edit: I forgot to add, I'm not doing any overclocking. I've got MSI Afterburner running for custom fan control profiles, and my BIOS should be set to 'normal', not performance.
Update2: Here is the bugcheck analysis:
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arg1: 0000000000000040, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80334cf8503, address which referenced memory
READ_ADDRESS: unable to get nt!MmSpecialPoolStart
unable to get nt!MmSpecialPoolEnd
unable to get nt!MmPagedPoolEnd
unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8]
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
TRAP_FRAME: ffffd000dac9c780 -- (.trap 0xffffd000dac9c780)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000008 rbx=0000000000000000 rcx=ffffe000cef9ed80
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80334cf8503 rsp=ffffd000dac9c910 rbp=0000000000000008
r8=0000000000000000 r9=0000000000000000 r10=ffffe000ce9ee000
r11=ffffd000dac9c9d0 r12=0000000000000000 r13=0000000000000000
iopl=0 nv up ei pl zr na po nc
fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8] ds:00000000`00000040=????????????????
Resetting default scope
LAST_CONTROL_TRANSFER: from fffff80334d564e9 to fffff80334d4b940
ffffd000`dac9c638 fffff803`34d564e9 : 00000000`0000000a 00000000`00000040 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
ffffd000`dac9c640 fffff803`34d54cc7 : ffffe000`cc4de080 00000000`00000000 00000000`00000001 00000000`00000000 : nt!KiBugCheckDispatch+0x69
ffffd000`dac9c780 fffff803`34cf8503 : 00000000`00000000 ffffe000`ce9ee000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x247
ffffd000`dac9c910 fffff801`efc7ba34 : ffffe000`ced983a0 fffff801`efb5c07b ffffe000`ce9dc000 fffff801`00000000 : nt!PoFxActivateComponent+0x3f
ffffd000`dac9c940 fffff801`efb5bbfc : ffffffff`ffffffff 00000000`c0000001 ffffe000`ce9dc000 ffffcb1f`190e41b6 : dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+0x2b0
ffffd000`dac9c9f0 fffff801`efb5b19a : ffffffff`ffffffff 00000000`002fd831 00000000`002fd831 ffffe000`ce9dc000 : dxgmms2!VidSchiSwitchContextWithCheck+0x37c
ffffd000`dac9ca80 fffff801`efbba75d : ffffe000`cefd95e0 ffffd000`dac9cbd0 ffffe000`cefd9500 ffffe000`ce9ee000 : dxgmms2!VidSchiScheduleCommandToRun+0x41a
ffffd000`dac9cb80 fffff801`efbba720 : ffffe000`ce9ee500 ffffe000`ce9ee000 00000000`00000080 ffffe000`cb892700 : dxgmms2!VidSchiRun_PriorityTable+0x2d
ffffd000`dac9cbd0 fffff803`34c4fa45 : 00000204`a4bb3dfe fffff803`34d509af 00000000`00010023 00000000`016e2754 : dxgmms2!VidSchiWorkerThread+0x80
ffffd000`dac9cc10 fffff803`34d50ae6 : ffffd000`d7db2180 ffffe000`cedff840 fffff803`34c4fa04 fffff801`ee47d8af : nt!PspSystemThreadStartup+0x41
ffffd000`dac9cc60 00000000`00000000 : ffffd000`dac9d000 ffffd000`dac97000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
fffff801`efc7ba34 4584ff test r15b,r15b
Some results have been pointing me towards disabling RPTM, but I'm not using Intel for anything outside of the integrated onboard NIC, so I'm still at a loss as to what is causing this. I've still got a few ideas up my sleeve though.
The faulting IP and the call are both throwing me off.
Faulting IP: nt!PoFxActivateComponent+3f
Faulting Bucket ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker
It sounds like, while I had Witcher 3 playing, windows attempted to flip power on the display adapter when it was already running, freaking the kernel out and faulting the system. I could be completely off, but I'm going based on what the SetPower thing is: DxgkCbSetPowerComponentActive routine (Windows Drivers)
I had an idea to underclock my GPU a few MHz, and increase voltage slightly, but I've not actually done this yet as I want to be able to replace this if I royally FUBAR the card doing this.