AnsweredAssumed Answered

Odd Kernel Crash, BSOD, IRQL Problems

Question asked by noodlesdefyyou on Aug 24, 2016
Latest reply on Sep 6, 2016 by timtim32

I posted this over on TomsHardware, but it's not had any traction yet. Here is the original post (includes some MemDump files as well, in a zip on dropbox): Extremely Peculiar Issue, GPU Kernel Crash, BSOD - GPUs - Windows 10

 

For those that don't want to follow through to the link, here is the entire content of the post, as well as one of the BugCheck analysis results:

 

I've been working on figuring out what on Earth is causing this, and have had little luck so far.

 

Let me start with my current system specs:

 

CPU: AMD FX8320E

Mobo: Asus Crosshair-V Formula Z

RAM: EVGA 2x8GB DDR3 2400

GPU: Asus Strix R9-Fury (16.7.3 Driver, not installed Beta driver yet)

Corsair H100i GTX Cooler

PSU: EVGA 1000P2

Storage: Samsung 850 Pro 500GB

2x Seagate Barracuda 2TB 7200 RPM

Asus PB287Q via DP Cable

Win10

 

 

The original rig was bought about a year ago, mid-September. The storage, cooler, GPU, and PSU are still original from that build. I have the original parts lying around still, from various swaps I've tried. I had 0 issues with system stability until around February 2016. 4K Gaming, while having some FPS issues in certain games, was flawless. Streaming, encoding, whatever I wanted to do, my rig was stable.

 

I believe it was around the 16.3.1 Driver release that my rig started becoming frequently unstable. Random rebooting, display cutting out, 'snow' on screen when in 4K (this slowly migrated over to 1080P as well). The odd part about the 'snow' was it was only visible locally. If I used TeamViewer to remote in, the snow wasn't visible. I fought this for a few months, replacing random items I felt may have been the culprit. DP Cable, RAM, CPU, and finally the Motherboard; all to no avail.

 

Performed a fresh reinstall of Win10, and I was good for about a week until these blips started cropping up again. Luckily I've not had the random reboot lately, but now I do get full-blown GPU Driver crashing (and recovering). The game that usually triggers this is Witcher 3, though I do get it when just browsing the web doing basically nothing; though it's not nearly as frequent as W3 triggering. And here is the really odd part. I can play W3 for 8+ hours perfectly fine some days, and crash after 20min another. Load it up again, 4-5 hours stable. It doesn't make any sense.

 

 

As of this morning, I've received 3 BSOD, IRQL LESS THAN OR NOT EQUAL TO codes. Once triggered by Magic Duels, twice triggered by Witcher 3. I've used BSV and VS2015 Debugging, but nothing has proved too fruitful in providing results. The best I've gotten so far is that something is freaking out ntoskrnl.sys, but I can't seem to locate what. I'm presuming it's still the AMD Driver, but I've not seen solid proof of it yet. I know the screen flickers, and I get the typical 'sound' my OS makes when the driver crashes and recovers, but now they are BSOD's instead of just dropping back to desktop.

 

I'm honestly suspecting there is something wrong with the GPU at this point, I just don't quite have another 500$+ to replace it yet (short of throwing it in someone else's rig at this point, and see if the behavior follows).

 

Here is a DropBox link to all 3 dmp files. Any help would be greatly appreciated. If there is any other info needed, let me know.

 

 

Edit: I forgot to add, I'm not doing any overclocking. I've got MSI Afterburner running for custom fan control profiles, and my BIOS should be set to 'normal', not performance.

 

 

Update2: Here is the bugcheck analysis:

 

*******************************************************************************

 

Bugcheck Analysis

 

*******************************************************************************

 

IRQL_NOT_LESS_OR_EQUAL (a)

An attempt was made to access a pageable (or completely invalid) address at an

interrupt request level (IRQL) that is too high. This is usually

caused by drivers using improper addresses.

If a kernel debugger is available get the stack backtrace.

Arguments:

Arg1: 0000000000000040, memory referenced

Arg2: 0000000000000002, IRQL

Arg3: 0000000000000000, bitfield :

bit 0 : value 0 = read operation, 1 = write operation

bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)

Arg4: fffff80334cf8503, address which referenced memory

 

Debugging Details:

------------------

 

 

READ_ADDRESS: unable to get nt!MmSpecialPoolStart

unable to get nt!MmSpecialPoolEnd

unable to get nt!MmPagedPoolEnd

unable to get nt!MmNonPagedPoolStart

unable to get nt!MmSizeOfNonPagedPoolInBytes

0000000000000040

 

CURRENT_IRQL: 2

 

FAULTING_IP:

nt!PoFxActivateComponent+3f

fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8]

 

CUSTOMER_CRASH_COUNT: 1

 

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

 

BUGCHECK_STR: AV

 

PROCESS_NAME: System

 

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

 

TRAP_FRAME: ffffd000dac9c780 -- (.trap 0xffffd000dac9c780)

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000000000008 rbx=0000000000000000 rcx=ffffe000cef9ed80

rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000

rip=fffff80334cf8503 rsp=ffffd000dac9c910 rbp=0000000000000008

r8=0000000000000000 r9=0000000000000000 r10=ffffe000ce9ee000

r11=ffffd000dac9c9d0 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0 nv up ei pl zr na po nc

nt!PoFxActivateComponent+0x3f:

fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8] ds:00000000`00000040=????????????????

Resetting default scope

 

LAST_CONTROL_TRANSFER: from fffff80334d564e9 to fffff80334d4b940

 

STACK_TEXT:

ffffd000`dac9c638 fffff803`34d564e9 : 00000000`0000000a 00000000`00000040 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx

ffffd000`dac9c640 fffff803`34d54cc7 : ffffe000`cc4de080 00000000`00000000 00000000`00000001 00000000`00000000 : nt!KiBugCheckDispatch+0x69

ffffd000`dac9c780 fffff803`34cf8503 : 00000000`00000000 ffffe000`ce9ee000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x247

ffffd000`dac9c910 fffff801`efc7ba34 : ffffe000`ced983a0 fffff801`efb5c07b ffffe000`ce9dc000 fffff801`00000000 : nt!PoFxActivateComponent+0x3f

ffffd000`dac9c940 fffff801`efb5bbfc : ffffffff`ffffffff 00000000`c0000001 ffffe000`ce9dc000 ffffcb1f`190e41b6 : dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+0x2b0

ffffd000`dac9c9f0 fffff801`efb5b19a : ffffffff`ffffffff 00000000`002fd831 00000000`002fd831 ffffe000`ce9dc000 : dxgmms2!VidSchiSwitchContextWithCheck+0x37c

ffffd000`dac9ca80 fffff801`efbba75d : ffffe000`cefd95e0 ffffd000`dac9cbd0 ffffe000`cefd9500 ffffe000`ce9ee000 : dxgmms2!VidSchiScheduleCommandToRun+0x41a

ffffd000`dac9cb80 fffff801`efbba720 : ffffe000`ce9ee500 ffffe000`ce9ee000 00000000`00000080 ffffe000`cb892700 : dxgmms2!VidSchiRun_PriorityTable+0x2d

ffffd000`dac9cbd0 fffff803`34c4fa45 : 00000204`a4bb3dfe fffff803`34d509af 00000000`00010023 00000000`016e2754 : dxgmms2!VidSchiWorkerThread+0x80

ffffd000`dac9cc10 fffff803`34d50ae6 : ffffd000`d7db2180 ffffe000`cedff840 fffff803`34c4fa04 fffff801`ee47d8af : nt!PspSystemThreadStartup+0x41

ffffd000`dac9cc60 00000000`00000000 : ffffd000`dac9d000 ffffd000`dac97000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

 

 

STACK_COMMAND: kb

 

FOLLOWUP_IP:

dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+2b0

fffff801`efc7ba34 4584ff test r15b,r15b

 

SYMBOL_STACK_INDEX: 4

 

SYMBOL_NAME: dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+2b0

 

FOLLOWUP_NAME: MachineOwner

 

MODULE_NAME: dxgkrnl

 

IMAGE_NAME: dxgkrnl.sys

 

DEBUG_FLR_IMAGE_TIMESTAMP: 57a1b59f

 

IMAGE_VERSION: 10.0.10586.545

 

BUCKET_ID_FUNC_OFFSET: 2b0

 

FAILURE_BUCKET_ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

 

BUCKET_ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

 

ANALYSIS_SOURCE: KM

 

FAILURE_ID_HASH_STRING: km:av_dxgkrnl!dxgadapter::setpowercomponentactivecbworker

 

FAILURE_ID_HASH: {a533151a-1aae-b601-c7af-0921bbb04a67}

 

Followup: MachineOwner

 

 

Some results have been pointing me towards disabling RPTM, but I'm not using Intel for anything outside of the integrated onboard NIC, so I'm still at a loss as to what is causing this. I've still got a few ideas up my sleeve though.

 

 

 

The faulting IP and the call are both throwing me off.

 

Faulting IP: nt!PoFxActivateComponent+3f

Faulting Bucket ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

Module: dxgkrnl.sys

 

It sounds like, while I had Witcher 3 playing, windows attempted to flip power on the display adapter when it was already running, freaking the kernel out and faulting the system. I could be completely off, but I'm going based on what the SetPower thing is: DxgkCbSetPowerComponentActive routine (Windows Drivers)

 

 

I had an idea to underclock my GPU a few MHz, and increase voltage slightly, but I've not actually done this yet as I want to be able to replace this if I royally FUBAR the card doing this.

Outcomes