cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gopal_hc
Journeyman III

OpenCL program is getting killed ?

I am developing opencl program using MultiGPU.

I have to launch very large number of threads. At a time i am launching only few threads for a kernel, based on number of resources(registers usage and local memory usage ) used to best utilize GPU resources. So i am launching my kernel total n times(n = N/M), where N is total number of threads that i have to launch, M is number of threads that i can be launched at a time and n is number of times needed to launch the kernel. But it is getting killed after large number of iterations.

assume i have a loop

for(i =0; i < n; i++) {

....writing data from CPU to GPU device using clEnqueueWriteBuffer()....

....launching of kernel using clEnqueueNDRangeKernel().....

....waiting for all commands in command queue to finish using clFinish()...

....reading of data from GPU to CPU clEnqueueReadBuffer()......

}

Why it is getting killed after n greater than 600 ?

I am using Nvidia GPU device :: GeForce GTX 295

                Platform Version    ::  OpenCL 1.1

                Operating System  ::  Ubuntu 11.04

Thanks in advance.

0 Likes
3 Replies
Wenju
Elite

d_new_input_2d = (cl_mem *)malloc( sizeof(cl_mem) * num_devices);     // line 40

d_new_input_2d[icount] = clCreateBuffer(context, CL_MEM_READ_WRITE,

                            max_size * 5 * sizeof(unsigned int), NULL, &ret);      // line 55

each element size = max_size * 5 * sizeof(unsigned int)

so max_size * 5 * sizeof(unsigned int)   ( > or = or < )    sizeof(cl_mem)  ;

To be honest, you must be careful about the memory size, especially in your code, each loop will allocate memory space.

Look at line 135, kernel[icount] = clCreateKernel(program, "Kernel_name", &ret); the kernels are the same one?

I'm not sure what caused the result, what's the error message?

0 Likes

Hi Wenju,

Look at line 135, kernel[icount] = clCreateKernel(program, "Kernel_name", &ret); the kernels are the same one?

Yes, kernel is same for both the devices.

My program is running and giving correct result for less than 600 iterations.

But after 600 iterations(approx) it is displaying Killed message. Why ?

0 Likes

I mean that you waste a lot of memory space, you should optimize your code. For example,

for( j = 0; j< iteration; j++)

     for (i=0; i < device_number; i++)

     {

          // all operation for one device. like create one buffer, enqueue command, etc

         //  remember release memory resource

     }

}

just have a try. Good luck.

0 Likes