7 Replies Latest reply on Sep 6, 2016 3:08 PM by timtim32

    Odd Kernel Crash, BSOD, IRQL Problems

    noodlesdefyyou

      I posted this over on TomsHardware, but it's not had any traction yet. Here is the original post (includes some MemDump files as well, in a zip on dropbox): Extremely Peculiar Issue, GPU Kernel Crash, BSOD - GPUs - Windows 10

       

      For those that don't want to follow through to the link, here is the entire content of the post, as well as one of the BugCheck analysis results:

       

      I've been working on figuring out what on Earth is causing this, and have had little luck so far.

       

      Let me start with my current system specs:

       

      CPU: AMD FX8320E

      Mobo: Asus Crosshair-V Formula Z

      RAM: EVGA 2x8GB DDR3 2400

      GPU: Asus Strix R9-Fury (16.7.3 Driver, not installed Beta driver yet)

      Corsair H100i GTX Cooler

      PSU: EVGA 1000P2

      Storage: Samsung 850 Pro 500GB

      2x Seagate Barracuda 2TB 7200 RPM

      Asus PB287Q via DP Cable

      Win10

       

       

      The original rig was bought about a year ago, mid-September. The storage, cooler, GPU, and PSU are still original from that build. I have the original parts lying around still, from various swaps I've tried. I had 0 issues with system stability until around February 2016. 4K Gaming, while having some FPS issues in certain games, was flawless. Streaming, encoding, whatever I wanted to do, my rig was stable.

       

      I believe it was around the 16.3.1 Driver release that my rig started becoming frequently unstable. Random rebooting, display cutting out, 'snow' on screen when in 4K (this slowly migrated over to 1080P as well). The odd part about the 'snow' was it was only visible locally. If I used TeamViewer to remote in, the snow wasn't visible. I fought this for a few months, replacing random items I felt may have been the culprit. DP Cable, RAM, CPU, and finally the Motherboard; all to no avail.

       

      Performed a fresh reinstall of Win10, and I was good for about a week until these blips started cropping up again. Luckily I've not had the random reboot lately, but now I do get full-blown GPU Driver crashing (and recovering). The game that usually triggers this is Witcher 3, though I do get it when just browsing the web doing basically nothing; though it's not nearly as frequent as W3 triggering. And here is the really odd part. I can play W3 for 8+ hours perfectly fine some days, and crash after 20min another. Load it up again, 4-5 hours stable. It doesn't make any sense.

       

       

      As of this morning, I've received 3 BSOD, IRQL LESS THAN OR NOT EQUAL TO codes. Once triggered by Magic Duels, twice triggered by Witcher 3. I've used BSV and VS2015 Debugging, but nothing has proved too fruitful in providing results. The best I've gotten so far is that something is freaking out ntoskrnl.sys, but I can't seem to locate what. I'm presuming it's still the AMD Driver, but I've not seen solid proof of it yet. I know the screen flickers, and I get the typical 'sound' my OS makes when the driver crashes and recovers, but now they are BSOD's instead of just dropping back to desktop.

       

      I'm honestly suspecting there is something wrong with the GPU at this point, I just don't quite have another 500$+ to replace it yet (short of throwing it in someone else's rig at this point, and see if the behavior follows).

       

      Here is a DropBox link to all 3 dmp files. Any help would be greatly appreciated. If there is any other info needed, let me know.

       

       

      Edit: I forgot to add, I'm not doing any overclocking. I've got MSI Afterburner running for custom fan control profiles, and my BIOS should be set to 'normal', not performance.

       

       

      Update2: Here is the bugcheck analysis:

       

      *******************************************************************************

       

      Bugcheck Analysis

       

      *******************************************************************************

       

      IRQL_NOT_LESS_OR_EQUAL (a)

      An attempt was made to access a pageable (or completely invalid) address at an

      interrupt request level (IRQL) that is too high. This is usually

      caused by drivers using improper addresses.

      If a kernel debugger is available get the stack backtrace.

      Arguments:

      Arg1: 0000000000000040, memory referenced

      Arg2: 0000000000000002, IRQL

      Arg3: 0000000000000000, bitfield :

      bit 0 : value 0 = read operation, 1 = write operation

      bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)

      Arg4: fffff80334cf8503, address which referenced memory

       

      Debugging Details:

      ------------------

       

       

      READ_ADDRESS: unable to get nt!MmSpecialPoolStart

      unable to get nt!MmSpecialPoolEnd

      unable to get nt!MmPagedPoolEnd

      unable to get nt!MmNonPagedPoolStart

      unable to get nt!MmSizeOfNonPagedPoolInBytes

      0000000000000040

       

      CURRENT_IRQL: 2

       

      FAULTING_IP:

      nt!PoFxActivateComponent+3f

      fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8]

       

      CUSTOMER_CRASH_COUNT: 1

       

      DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

       

      BUGCHECK_STR: AV

       

      PROCESS_NAME: System

       

      ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

       

      TRAP_FRAME: ffffd000dac9c780 -- (.trap 0xffffd000dac9c780)

      NOTE: The trap frame does not contain all registers.

      Some register values may be zeroed or incorrect.

      rax=0000000000000008 rbx=0000000000000000 rcx=ffffe000cef9ed80

      rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000

      rip=fffff80334cf8503 rsp=ffffd000dac9c910 rbp=0000000000000008

      r8=0000000000000000 r9=0000000000000000 r10=ffffe000ce9ee000

      r11=ffffd000dac9c9d0 r12=0000000000000000 r13=0000000000000000

      r14=0000000000000000 r15=0000000000000000

      iopl=0 nv up ei pl zr na po nc

      nt!PoFxActivateComponent+0x3f:

      fffff803`34cf8503 488b14c2 mov rdx,qword ptr [rdx+rax*8] ds:00000000`00000040=????????????????

      Resetting default scope

       

      LAST_CONTROL_TRANSFER: from fffff80334d564e9 to fffff80334d4b940

       

      STACK_TEXT:

      ffffd000`dac9c638 fffff803`34d564e9 : 00000000`0000000a 00000000`00000040 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx

      ffffd000`dac9c640 fffff803`34d54cc7 : ffffe000`cc4de080 00000000`00000000 00000000`00000001 00000000`00000000 : nt!KiBugCheckDispatch+0x69

      ffffd000`dac9c780 fffff803`34cf8503 : 00000000`00000000 ffffe000`ce9ee000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x247

      ffffd000`dac9c910 fffff801`efc7ba34 : ffffe000`ced983a0 fffff801`efb5c07b ffffe000`ce9dc000 fffff801`00000000 : nt!PoFxActivateComponent+0x3f

      ffffd000`dac9c940 fffff801`efb5bbfc : ffffffff`ffffffff 00000000`c0000001 ffffe000`ce9dc000 ffffcb1f`190e41b6 : dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+0x2b0

      ffffd000`dac9c9f0 fffff801`efb5b19a : ffffffff`ffffffff 00000000`002fd831 00000000`002fd831 ffffe000`ce9dc000 : dxgmms2!VidSchiSwitchContextWithCheck+0x37c

      ffffd000`dac9ca80 fffff801`efbba75d : ffffe000`cefd95e0 ffffd000`dac9cbd0 ffffe000`cefd9500 ffffe000`ce9ee000 : dxgmms2!VidSchiScheduleCommandToRun+0x41a

      ffffd000`dac9cb80 fffff801`efbba720 : ffffe000`ce9ee500 ffffe000`ce9ee000 00000000`00000080 ffffe000`cb892700 : dxgmms2!VidSchiRun_PriorityTable+0x2d

      ffffd000`dac9cbd0 fffff803`34c4fa45 : 00000204`a4bb3dfe fffff803`34d509af 00000000`00010023 00000000`016e2754 : dxgmms2!VidSchiWorkerThread+0x80

      ffffd000`dac9cc10 fffff803`34d50ae6 : ffffd000`d7db2180 ffffe000`cedff840 fffff803`34c4fa04 fffff801`ee47d8af : nt!PspSystemThreadStartup+0x41

      ffffd000`dac9cc60 00000000`00000000 : ffffd000`dac9d000 ffffd000`dac97000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

       

       

      STACK_COMMAND: kb

       

      FOLLOWUP_IP:

      dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+2b0

      fffff801`efc7ba34 4584ff test r15b,r15b

       

      SYMBOL_STACK_INDEX: 4

       

      SYMBOL_NAME: dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker+2b0

       

      FOLLOWUP_NAME: MachineOwner

       

      MODULE_NAME: dxgkrnl

       

      IMAGE_NAME: dxgkrnl.sys

       

      DEBUG_FLR_IMAGE_TIMESTAMP: 57a1b59f

       

      IMAGE_VERSION: 10.0.10586.545

       

      BUCKET_ID_FUNC_OFFSET: 2b0

       

      FAILURE_BUCKET_ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

       

      BUCKET_ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

       

      ANALYSIS_SOURCE: KM

       

      FAILURE_ID_HASH_STRING: km:av_dxgkrnl!dxgadapter::setpowercomponentactivecbworker

       

      FAILURE_ID_HASH: {a533151a-1aae-b601-c7af-0921bbb04a67}

       

      Followup: MachineOwner

       

       

      Some results have been pointing me towards disabling RPTM, but I'm not using Intel for anything outside of the integrated onboard NIC, so I'm still at a loss as to what is causing this. I've still got a few ideas up my sleeve though.

       

       

       

      The faulting IP and the call are both throwing me off.

       

      Faulting IP: nt!PoFxActivateComponent+3f

      Faulting Bucket ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

      Module: dxgkrnl.sys

       

      It sounds like, while I had Witcher 3 playing, windows attempted to flip power on the display adapter when it was already running, freaking the kernel out and faulting the system. I could be completely off, but I'm going based on what the SetPower thing is: DxgkCbSetPowerComponentActive routine (Windows Drivers)

       

       

      I had an idea to underclock my GPU a few MHz, and increase voltage slightly, but I've not actually done this yet as I want to be able to replace this if I royally FUBAR the card doing this.

        • Re: Odd Kernel Crash, BSOD, IRQL Problems
          timtim32

          Did you find any solutions to this bluescreen?

            • Re: Odd Kernel Crash, BSOD, IRQL Problems
              noodlesdefyyou

              Not yet. I've opened a support case with AMD to see if they have any insight, and I'm about to open one up with Microsoft. Do you have crash dumps available on a DropBox (or other file-sharing site) link? If you've got the same identical stack trace and overall behavior experienced as I do, posting your system specifications would be greatly appreciated as well. You (or anyone else stumbling upon this) can use Speccy to automatically upload your entire system specs with a convenient link. Otherwise you can run dxdiag, save it, and include it with a zip containing the dmp files.

               

              Official Speccy Link

               

               

              Edit: The key (I believe so far) with my crash dumps is the dx call being made triggering the IRQL mismatch, specifically:

               

              FAULTING_IP:

              nt!PoFxActivateComponent+3f

              FAILURE_BUCKET_ID: AV_dxgkrnl!DXGADAPTER::SetPowerComponentActiveCBWorker

                • Re: Odd Kernel Crash, BSOD, IRQL Problems
                  timtim32

                  I am very interested what comes out of the AMD/Microsoft support case.

                   

                  Here are my system specs: http://speccy.piriform.com/results/DPDczbuVgT0ESnQDo7cLcDt

                  and the minidump:  090116-52468-01 - Copy.dmp - Google Drive

                   

                  I noticed that watching Youtube in full screen gives me the blue screen but not right away most of the times it happens after a minute or 2 of watching Youtube in full screen.

                  I also think that it only happens when I have my HDMI tv connected.

                   

                  I updated my drivers and did a clean install of the AMD drivers but with no luck.

                    • Re: Odd Kernel Crash, BSOD, IRQL Problems
                      noodlesdefyyou

                      I'll definitely post the update/solution (if any) once I get this figured out. I really dislike coming across 'the same issue' only to find things like 'I'll PM you the answer' or 'Figured it out', with no solution.

                       

                      I've generated at least 2 dumps in the past week outside of gaming entirely. Once was surfing reddit, maybe 4-6 tabs open (no video on any tabs), and another while watching Cowboy Bebop on VLC Media Player. The day prior I had watched 2 entire movies through VLC, not a single problem.

                       

                      Regarding your dump file though, I noticed this:

                      FAILURE_BUCKET_ID:  AV_dxgkrnl!TraceDxgkFunctionProfiler

                      BUCKET_ID:  AV_dxgkrnl!TraceDxgkFunctionProfiler

                      PRIMARY_PROBLEM_CLASS:  AV_dxgkrnl!TraceDxgkFunctionProfiler

                       

                      This isn't quite identical to the faulting class I am getting. I'm not sure what troubleshooting you have already done, but this is what I would do. The stack trace is also a bit different.

                       

                      READ_ADDRESS: fffff80207c05520: Unable to get MiVisibleState

                      0000000000000020

                       

                      STACK_TEXT: 

                      nt!KeBugCheckEx

                      nt!KiBugCheckDispatch+0x69

                      nt!KiPageFault+0x247

                      nt!PoFxActivateComponent+0x3f

                      dxgkrnl!TraceDxgkFunctionProfiler+0xa014

                      dxgmms2+0xbbfc

                      dxgmms2+0xb19a

                      dxgmms2!VidMmInterface+0x458ad

                      dxgmms2!VidMmInterface+0x45870

                      nt!PspSystemThreadStartup+0x41

                      nt!KiStartSystemThread+0x16

                       

                      (I took the memory values out to clean it up a bit, the start of the faulting thread is at the bottom, StartSystemThread, and the resulting crash is at the top)

                       

                      1) Use DDU (in safe mode) to completely remove all AMD Drivers, and reinstall directly from AMD Driver Download. Don't let windows 'install' this for you like it does natively. I've seen Windows overwrite the proper AMD driver with older driver files, causing incompatibilities and corruption.

                       

                      2) Update your BIOS. Your BIOS is on revision 1.5, latest is 1.9 according to ASRock's Website. This may or may not help, I'd do this as a 'I've tried everything else' option. Make sure you read the instructions for proper BIOS flashing 4 times, go to sleep, wake up and read the instructions again. This operation can brick your motherboard if you do it incorrectly, and in most cases is not necessary except to support new chips on a board released before the chip itself. First example to come to mind is the AMD Phenom2 x6 1090T on the Asus M4A785-M.

                       

                      3) Since you have an integrated Intel Graphics chip with a dedicated AMD Card, you MIGHT get some luck out of this Microsoft Solution, to disable Runtime Power Management.

                       

                       

                      I've just realized what time it is, and I am well past time to get some sleep. I'll poke at this some more this weekend if I have the time, let me know if the above helps at all.