I am working on OpenCL applications on AMD GPU. My OS is ubuntu 14.04.
I am always writing code with failures and have to repeatedly reboot after my program keeps running and do not stop.
Any driver support such as soft reset instructions?
Solved! Go to Solution.
Don't reboot, just wait a few seconds more!
If your kernel won't finish in about 10 secs, the driver will reset the gpu, and you can kill your application and run it again.
Don't reboot, just wait a few seconds more!
If your kernel won't finish in about 10 secs, the driver will reset the gpu, and you can kill your application and run it again.
Is the method useful for APU?
My kernel has hanged for 7mins, and I cannot kill it by command.
Hi,
That's a good question.
What I recommend trying if you have a kernel which you want to terminate, try to kill the Command queue.
Use clReleaseCommandQueue on the queue which executes the kernel, hopefully - killing the queue will terminate the kernel immediately.
Let me know if you do it, interested in hearing the result.
Regards,
Tomer Gal, CTO at OpTeamizer
Do you mean write another program to kill the command queue?
I also have a NVIDIA GPU card, it's watchdog will kill a kernel is the execution time is too long. Does AMD has the same design?
That's an option.
You can write a listener within your process which waits for a notification to kill the command queue.
You can write another process which will notify your OpenCL process to kill the command queue.
Before you do that, I recommend writing a test case of an OpenCL code doing while(true), doing a Sleep(5000) on the CPU and then releasing the command queue.
Regards,
Tomer Gal, CTO at OpTeamizer
lennox wrote:
Do you mean write another program to kill the command queue?
I also have a NVIDIA GPU card, it's watchdog will kill a kernel is the execution time is too long. Does AMD has the same design?
Not necessary. clEnqueueNDRangeKernel is asynchronous. Wait for as much time you need (sleep or usleep). Then check with clGetEventInfo. If event is Done, releaseEvent, else kill queue.
Hi Nibal,
Thanks your answer. Which API I can use to kill a command queue? clReleaseCommandQueue?
Yes.
Just want to share one point in this regard. I've doubt that clReleaseCommandQueue may solve the purpose here.
The clReleaseCommandQueue API says:
"After the command_queue
reference count becomes zero and all commands queued to command_queue
have finished (e.g., kernel executions, memory object updates, etc.), the command-queue is deleted."
dipak wrote:
Just want to share one point in this regard. I've doubt that clReleaseCommandQueue may solve the purpose here.
The clReleaseCommandQueue API says:
"After the
command_queue
reference count becomes zero and all commands queued tocommand_queue
have finished (e.g., kernel executions, memory object updates, etc.), the command-queue is deleted."
@youwei: Oops! Just building on tomer_gal's comments. Seems, you will have to find another way to kill the kernel 😞
@dipak: Is there a way to kill an already executing kernel?