2 Replies Latest reply on Sep 6, 2009 11:04 AM by Raistmer

    Problems with GPU Instability


      I've written an application which utilizes the Stream SDK to provide GPGPU acceleration. I'm having problems with the GPU's stability when I run the application.

      I've incorporated a way to adjust the size of the data pieces (by "pieces," I mean arrays of numbers) that are sent to the GPU, and there is an overhead for each piece, so the larger the pieces, the better. However, I've found that the larger these "pieces" are, the more unstable the GPU becomes. At times when I am running the application, the screen will freeze, turn black for a second, then come back with a warning that the "display driver has stopped responding and has recovered."

      I have no idea what's causing this, and I don't even know where to begin. Is this a defect/limitation in the SDK, or did I screw something up in my code? Are there any known solutions? I'd really love to get this working.

      Thanks for any help you can offer.


      EDIT: Occasionally the application will report "failed to allocate memory."

      This leads me to believe it's a memory leak, but I don't believe I'm dynamically allocating memory for the GPU anywhere.

        • Problems with GPU Instability

          Maybe you:

          1. Forgot to set DisableBugCheck in registry. Usually tells that driver stop responding after 30 seconds.

          2. Overclock the GPU memory, especially 4850 which is very sensitive to memory overclock. I'm having driver crashed.

          3. Do something wrong in your code e.g. index out of bounds, more aggresive fault behaviour will happen in scatter kernel like screen freeze.

          4. You allocate too much memory, do some math to calculate the space required first. I'm having "application stop responding" when allocate 1gigs on 512megs card.

          On your problem, I think you should try to calculate your memory first. Freeing up some when allocate new space for another problem is good way too.

          • Problems with GPU Instability
            For your first problem, especially if you run under Vista, it's windows watchdog timer.
            In Vista if videodriver will not respond more than 2 seconds OS kernel will restart it.
            So, if display attached to GPU and Vista's desctop expanded on this display, you will limited by 2 seconds of kernel call size. AFAIK this limit doesn't apply to GPUs not attached to display (secondary GPU/Tesla (for nVidia's cards)).
            This threshold can be changed via registry according MS article, but actually I observed such driver restarts even with "disabled" watchdog via registry.