Hi,
I want to use different queues (one for computations and one for communications) to mask communications with computations.
My problem is that my program generates segmentation fault.
I wrote a very simplified code just to illustrate my problem (cf. code joined), that exactly reproduces the behaviour my program has.
The principle is simple :
- I tranfer the first array from host to gpu using asynch write buffer,
- I transfer the second array in the same way,
- I begin the first kernel as soon as the first communication is finished, using synchronization on event (normally it should be concurrent with the second communication)
- I want to begin the second kernel as soon as the first one is finished and the second array is transferred (I synchronize on the event2) <= this synchronization generates the problem.
The erros is in pthread_create.c, start_thread(), and is a Segmentation Fault when running gdb.
It happens ONLY :
- when my computation queue has profiling (CL_QUEUE_PROFILING_ENABLE) enabled (communication queue has no impact),
- when I synchronize after using a kernel that uses cl_mem kernel argument (my kernel does nothing in this example).
This problem occurs using clEnqueueWriteBufffer or clEnqueueMapBuffer methods in the same way.
I do not have this problem using thie program on nvidia cards.
My config is : HD5870, Ubuntu 10.04 64 bits with AMD SDK 3/4 and 11.2/11.3 drivers.
I would like to identify my problem. What I am doing wrong ? Is this a bug ?
Thanks for your help !
//------------------------------------------------------------ CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array)); CL_CHECK(clEnqueueWriteBuffer(communication_queue, d_array2, CL_FALSE, 0, 5*sizeof(int), vals, 0, NULL, &event_array2)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array, NULL)); //------------------------------------------------------------ global_work_size[0] = 1; // Kernel call CHECK_STATUS(clSetKernelArg(empty_kernel, 0, sizeof(cl_mem), (void*) &d_array2)); CHECK_STATUS(clEnqueueNDRangeKernel(computation_queue, empty_kernel, 1, NULL, global_work_size, NULL, 1, &event_array2, NULL)); //------------------------------------------------------------ clFinish(computation_queue); clFinish(communication_queue);
In my Deathray project:
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=145496
I use multiple event queues and theoretically overlapping copies and kernel execution. With SDK 2.3 it doesn't work. I have to use clFinish to make the app work. At some point I'll try again with SDK 2.4.
I suspect it's not your fault.
rbarrere,
Thanks for sharing your experience.
So do you mean the program works fine on AMD GPU when profiling is not enabled on computation queue. Can you please post a test case. So we can reproduce the issue here.
Thanks
I made a simple program illustrating the problem, you can find it there : https://rapidshare.com/files/459250335/sync_problem.zip
The problem seems to appear randomly, as sometimes it works, sometimes it doesn't and generates a segmentation fault.
To make the program ok, in file sync_problem.c, just comment the line #109 and uncomment the line #110 => no more segmentation fault !!!
Compile using 'make', and execute with 'opencl.out'.
By default, it will use the first platform (#0) and the first GPU on the computer.
If you use different platforms (like I do), you can add as argument the number of the platform you want to use (ex: ./opencl.out 1 will use the platform #1).
Thanks.
Edit : I forgot to free the "program_src" malloc, but it does not change anything.
Nobody has the same problem ? (up)
rbarriere,
I do not find any hang issue with/wothout commenting the line you specified with the latest internal binaries. So the issue is fixed.
I would like to suggest you to use exception hadling after each cl function call. that makes things easier.
Please, could you tell me :
- which OS you use,
- which card you use (the device given by the program),
- which drivers/SDK you use.
It could help me to understand where my problem is.
Thanks for your help.
I work on Vista64 and Juniper(5770). I had tested the code with internal libraries and not with SDK2.4 ones, but i do not think there should be any issues with 2.4 too.
As per the code you sent, I do not see proper error code handling. The clBuildProgram was failing on my system.