7 Replies Latest reply on Jun 14, 2011 3:31 PM by himanshu.gautam

    Problem with multi queue and event synchronization

    rbarrere

      Hi,

      I want to use different queues (one for computations and one for communications) to mask communications with computations.

       

      My problem is that my program generates segmentation fault.

      I wrote a very simplified code just to illustrate my problem (cf. code joined), that exactly reproduces the behaviour my program has.

       

      The principle is simple :

      - I tranfer the first array from host to gpu using asynch write buffer,

      - I transfer the second array in the same way,

      - I begin the first kernel as soon as the first communication is finished, using synchronization on event (normally it should be concurrent with the second communication)

      - I want to begin the second kernel as soon as the first one is finished and the second array is transferred (I synchronize on the event2) <= this synchronization generates the problem.

       

      The erros is in pthread_create.c, start_thread(), and is a Segmentation Fault when running gdb.

       

      It happens ONLY :

      - when my computation queue has profiling (CL_QUEUE_PROFILING_ENABLE)  enabled (communication queue has no impact),

      - when I synchronize after using a kernel that uses cl_mem kernel argument (my kernel does nothing in this example).

      This problem occurs using clEnqueueWriteBufffer or clEnqueueMapBuffer methods in the same way.

       

      I do not have this problem using thie program on nvidia cards.

      My config is : HD5870, Ubuntu 10.04 64 bits with AMD SDK 3/4 and 11.2/11.3 drivers.

       

      I would like to identify my problem. What I am doing wrong ? Is this a bug ?

      Thanks for your help !

      //------------------------------------------------------------ CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array)); CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array2, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array2)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array, NULL)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array2)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array2, NULL)); //------------------------------------------------------------ clFinish(computation_queue); clFinish(communication_queue);