cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rbarrere
Journeyman III

Problem with multi queue and event synchronization

Hi,

I want to use different queues (one for computations and one for communications) to mask communications with computations.

 

My problem is that my program generates segmentation fault.

I wrote a very simplified code just to illustrate my problem (cf. code joined), that exactly reproduces the behaviour my program has.

 

The principle is simple :

- I tranfer the first array from host to gpu using asynch write buffer,

- I transfer the second array in the same way,

- I begin the first kernel as soon as the first communication is finished, using synchronization on event (normally it should be concurrent with the second communication)

- I want to begin the second kernel as soon as the first one is finished and the second array is transferred (I synchronize on the event2) <= this synchronization generates the problem.

 

The erros is in pthread_create.c, start_thread(), and is a Segmentation Fault when running gdb.

 

It happens ONLY :

- when my computation queue has profiling (CL_QUEUE_PROFILING_ENABLE)  enabled (communication queue has no impact),

- when I synchronize after using a kernel that uses cl_mem kernel argument (my kernel does nothing in this example).

This problem occurs using clEnqueueWriteBufffer or clEnqueueMapBuffer methods in the same way.

 

I do not have this problem using thie program on nvidia cards.

My config is : HD5870, Ubuntu 10.04 64 bits with AMD SDK 3/4 and 11.2/11.3 drivers.

 

I would like to identify my problem. What I am doing wrong ? Is this a bug ?

Thanks for your help !

//------------------------------------------------------------ CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array)); CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array2, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array2)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array, NULL)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array2)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array2, NULL)); //------------------------------------------------------------ clFinish(computation_queue); clFinish(communication_queue);

0 Likes
7 Replies
Jawed
Adept II

In my Deathray project:

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=145496

I use multiple event queues and theoretically overlapping copies and kernel execution. With SDK 2.3 it doesn't work. I have to use clFinish to make the app work. At some point I'll try again with SDK 2.4.

I suspect it's not your fault.

0 Likes

rbarrere,

Thanks for sharing your experience.

So do you mean the program works fine on AMD GPU when profiling is not enabled on computation queue. Can you please post a test case. So we can reproduce the issue here.

Thanks

0 Likes

I made a simple program illustrating the problem, you can find it there : https://rapidshare.com/files/459250335/sync_problem.zip

 

The problem seems to appear randomly, as sometimes it works, sometimes it doesn't and generates a segmentation fault.

To make the program ok, in file sync_problem.c, just comment the line #109 and uncomment the line #110 => no more segmentation fault !!!

 

Compile using 'make', and execute with 'opencl.out'.

 

By default, it will use the first platform (#0) and the first GPU on the computer.

If you use different platforms (like I do), you can add as argument the number of the platform you want to use (ex: ./opencl.out 1 will use the platform #1).

 

Thanks.

 

Edit : I forgot to free the "program_src" malloc, but it does not change anything.

0 Likes

Nobody has the same problem ? (up)

0 Likes

rbarriere,

I do not find any hang issue with/wothout commenting the line you specified with the latest internal binaries. So the issue is fixed.

I would like to suggest you to use exception hadling after each cl function call. that makes things easier. 

0 Likes

Please, could you tell me :

- which OS you use,

- which card you use (the device given by the program),

- which drivers/SDK you use.

 

It could help me to understand where my problem is.

Thanks for your help.

0 Likes

I work on Vista64 and Juniper(5770). I had tested the code with internal libraries and not with SDK2.4 ones, but i do not think there should be any issues with 2.4 too.

As per the code you sent, I do not see proper error code handling. The clBuildProgram was failing on my system.

0 Likes