rbarrere

Problem with multi queue and event synchronization

Discussion created by rbarrere on Apr 22, 2011
Latest reply on Jun 14, 2011 by himanshu.gautam

Hi,

I want to use different queues (one for computations and one for communications) to mask communications with computations.

 

My problem is that my program generates segmentation fault.

I wrote a very simplified code just to illustrate my problem (cf. code joined), that exactly reproduces the behaviour my program has.

 

The principle is simple :

- I tranfer the first array from host to gpu using asynch write buffer,

- I transfer the second array in the same way,

- I begin the first kernel as soon as the first communication is finished, using synchronization on event (normally it should be concurrent with the second communication)

- I want to begin the second kernel as soon as the first one is finished and the second array is transferred (I synchronize on the event2) <= this synchronization generates the problem.

 

The erros is in pthread_create.c, start_thread(), and is a Segmentation Fault when running gdb.

 

It happens ONLY :

- when my computation queue has profiling (CL_QUEUE_PROFILING_ENABLE)  enabled (communication queue has no impact),

- when I synchronize after using a kernel that uses cl_mem kernel argument (my kernel does nothing in this example).

This problem occurs using clEnqueueWriteBufffer or clEnqueueMapBuffer methods in the same way.

 

I do not have this problem using thie program on nvidia cards.

My config is : HD5870, Ubuntu 10.04 64 bits with AMD SDK 3/4 and 11.2/11.3 drivers.

 

I would like to identify my problem. What I am doing wrong ? Is this a bug ?

Thanks for your help !

//------------------------------------------------------------ CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array)); CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array2, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array2)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array, NULL)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array2)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array2, NULL)); //------------------------------------------------------------ clFinish(computation_queue); clFinish(communication_queue);

Outcomes