debugging kernel hang

Discussion created by Meteorhead on Nov 24, 2010
Latest reply on Dec 11, 2010 by himanshu.gautam

Hi all!

My question is how should one go about debugging a kernel that tends to hang when given too much work? On a CPU I would suspect a memory leak, if hanging is work dependant, but is this possible at all on a GPU? No memory allocation is done dynamically, so I'd think this is a 'no way'.

I do not want to blame it on the compiler, the drivers or the hardware, I'm just curious what are the typical tactics of finding out what could be the cause of the problem?

The simulation is written in a manner, that every kernel call more and more work is given to the GPU, namely the for cycle processed inside gets longer. It is also lengthened by the size of area simulated, and there is a strong correllation between given work and tendency to hang.

How would you go about finding the problem?