Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

OpenCL kernel crashes with -5 Error

I am developing OpenCL program using MultiGPU.

I have to launch very large number of threads. At a time i am launching only few threads for a kernel, based on number of resources(registers usage and local memory usage ) used to best utilize GPU resources. So i am launching my kernel total n times(n = N/M), where N is total number of threads that i have to launch, M is number of threads that i can be launched at a time and n is number of times needed to launch the kernel.

     Launching of kernel for few iterations goes successfully but it fails for more iterations. Kernel crashes with -5 Error when i am launching my kernel for more iterations.

  I am getting "CL_OUT_OF_RESOURCES" Error while while calling clEnqueueReadBuffer() function. It means that i am accessing GPU memory out of limit, but the same code is working fine for few iterations.

I am using Nvidia GPU device :: GeForce GTX 295

                Platform Version    ::  OpenCL 1.1

                Operating System  ::  Ubuntu 11.04

My Questions are ::

     What could be reasons the for error messages and crashes?

     What are the causes of getting "CL_OUT_OF_RESOURCES" Error?

Waiting for your quick reply.

Thanks in advance.

1 Solution

maybe add clFinish after each 1000-10000 clEnqueuNDRange()?

View solution in original post

8 Replies

maybe add clFinish after each 1000-10000 clEnqueuNDRange()?


Actually i am creating and releasing the input buffer for each call of clEnqueuNDRangeKernel() function. And using clWaitForEvents() for each call of clEnqueuNDRange() to wait for completion of execution.

Is this the correct way?


no it is not. you should reuse input buffers with Read/Write or Map/Unmap


Thanks again !!!!

after adding clfinish(), my code is running for large number of iterations.

I am adding clFinish() after each iteration, what is benefits of adding after each 1000-10000 clEnqueuNDRange() ?


benefit is that driver can send bigger batch to the GPU so it have smaller overhead. your batch size depend on how demanding is single run of your kernel.

Why is it not ?

I have on more dought about clFinish() vs clWaitforEvents().

clFinish() waits until all previously commands in command queue have completed and clWaitForEvents() also waits for commands identified by event objects to complete.

Then what is wrong in using clWaitForEvents() ?


Hi nou,

Thanx once again for your quick reply.

My program is running and not crashing with -5 Error. But it is getting killed after 600 iterations.

assume i have a loop

for(i =0; i < n; i++) {

....writing data from CPU to GPU device using clEnqueueWriteBuffer()....

....launching of kernel using clEnqueueNDRangeKernel().....

....waiting for all commands in command queue to finish using clFinish()...

....reading of data from GPU to CPU clEnqueueReadBuffer()......


where (n = N/M)  n is number of iterations needed to launch the kernel, where N is total number of threads that i have to launch, M is number of threads that i can be launched at a time.

Why it is getting killed after n greater than 600 ?


I have the same problem with nVidia GPUs with a PCI-E bandwith test, that i have written (klick)

It seems to be a problem in the nVidia driver. I tried to report this bug to nVidia, but because of the fact, that there forum and so on is offline since there hack, i am not able to report the bug

The AMD driver is in this part much better. It runs and runs and runs without any problems in my testsuite.

Btw. yes, you can solve the problem for the clEnqueueReadBuffer and clEnqueueWriteBuffer functions with the clFinish on nVidia GPUs, but you are not able to solve it for clEnqueueMapBuffer.... I am only able to say it for pinned memory.

Just as hint, usw pinned memory and clEnqueueMapBuffer. It is MUCH faster