7 Replies Latest reply on Nov 16, 2011 6:24 PM by fesc2000

    Display driver hangs

    fesc2000

      Hi,

      my OpenCL application regularly causes the display driver to hang and getting restarted (under Windows 7/64bit, catalyst 11.11).

       I'm wondering whether anyone has had the same experience and what to do about it (or how to debug it). Maybe there are some errors or constructs which are known to cause a driver crash (although i would always consider such a behaviour as driver bug ..).

      The same application runs fine under Linux.

      Thanks..

       

        • Display driver hangs
          antzrhere

          Are you certain it's not a case of the display timing out? On windows by default if the kernel runs for >5 seconds the display adapter is reset. It may just behave differently on Linux?

            • Display driver hangs
              fesc2000

              That might be the case, windows just tells me it didn't react any more.

              On the other hand, my kernels shouldn't take that long, there are no loops etc.

              Is there a way to increase that timeout value?

                • Display driver hangs
                  antzrhere

                  To disable timout set:

                   

                  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel



                  to 0 for no timeout or 3 to restore default functionality (this is quoted for Vista from AMD SDK 2.5 release notes).

                  The fact that you have no loops or barriers suggests this is not your problem. Does the screen black out for a second? - if so this would suggest a timeout problem as the device is being reset. If not it may be a fatal error on the part of the compiler - I've had this problem once with a kernel executed on the CPU - the program window went white whilst compiling and the program crashed - but it worked fine on the GPU. I guess it was a bug in the AMD OpenCL compiler as everything up until build program executed correctly. Have you tried your code on the CPU?

                    • Display driver hangs
                      fesc2000

                      When the error happens the application and desktop (except the mouse) freezes and gets restarted after some seconds.

                      Setting TdrLevel to 0 results in a complete freeze.

                      Using the CPU wouldn't work, because the application gets too slow. The error happens after the application has been running for a while.

                      Maybe i'm writing out of a buffer, although i though i'd taken care of this, but i think i have a 2nd look. Is there a defined behaviour when doing this, or could this cause a stalled kernel/GPU?

                        • Display driver hangs
                          antzrhere

                          Sounds like your kernel is getting stuck and causing the problem, hence why when you disable the timeout in windows it freezes completely.

                          I've never found that by writing out memory can cause a kernel to hang (without loops), however if you accidently spill into some other part of memory that is being used this could cause undefined results. As the GPU is abit of a black box I suppose anything could be possible.

                          Apart from loops and thread synchronisaton barriers I can't think of what else can cause a hang.

                          Could you post your kernel and any associated code? 

                  • Display driver hangs
                    MicahVillmow
                    fesc2000,
                    Most likely this is the watchdog timer causing a reset of your GPU on windows. Since a GPU is not a pre-emptible device, windows just resets it if a thread uses all of the resources of the graphics card for longer than a set period of time.