Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

Problems with GPU Instability

I've written an application which utilizes the Stream SDK to provide GPGPU acceleration. I'm having problems with the GPU's stability when I run the application.

I've incorporated a way to adjust the size of the data pieces (by "pieces," I mean arrays of numbers) that are sent to the GPU, and there is an overhead for each piece, so the larger the pieces, the better. However, I've found that the larger these "pieces" are, the more unstable the GPU becomes. At times when I am running the application, the screen will freeze, turn black for a second, then come back with a warning that the "display driver has stopped responding and has recovered."

I have no idea what's causing this, and I don't even know where to begin. Is this a defect/limitation in the SDK, or did I screw something up in my code? Are there any known solutions? I'd really love to get this working.

Thanks for any help you can offer.


EDIT: Occasionally the application will report "failed to allocate memory."

This leads me to believe it's a memory leak, but I don't believe I'm dynamically allocating memory for the GPU anywhere.

Tags (3)
2 Replies
Journeyman III

Problems with GPU Instability

Maybe you:

1. Forgot to set DisableBugCheck in registry. Usually tells that driver stop responding after 30 seconds.

2. Overclock the GPU memory, especially 4850 which is very sensitive to memory overclock. I'm having driver crashed.

3. Do something wrong in your code e.g. index out of bounds, more aggresive fault behaviour will happen in scatter kernel like screen freeze.

4. You allocate too much memory, do some math to calculate the space required first. I'm having "application stop responding" when allocate 1gigs on 512megs card.

On your problem, I think you should try to calculate your memory first. Freeing up some when allocate new space for another problem is good way too.

Adept II

Problems with GPU Instability

For your first problem, especially if you run under Vista, it's windows watchdog timer.
In Vista if videodriver will not respond more than 2 seconds OS kernel will restart it.
So, if display attached to GPU and Vista's desctop expanded on this display, you will limited by 2 seconds of kernel call size. AFAIK this limit doesn't apply to GPUs not attached to display (secondary GPU/Tesla (for nVidia's cards)).
This threshold can be changed via registry according MS article, but actually I observed such driver restarts even with "disabled" watchdog via registry.