cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mosix0
Journeyman III

OpenCL library steals signals

I have a program that uses OpenCL, but also does other things.  At times the program sleeps waiting for external signals (using "sigpause()"), but then I discovered that it failed to wake up when an expected signal arrived.

 

I searched and found the reason: the OpenCL library uses full threads, including the clone-flags CLONE_SIGHAND and CLONE_THREAD: this causes the library threads to share signals with the main program, so at random an OpenCL library-thread picks up a signal that is intended to wake the main program, processes the interrupt and returns to whatever it was doing before, but then the main program is never awakened and remains stuck.

 

I believe that although the library shares memory and perhaps also some file-descriptors with the main program, it has no need to share signals as well.

 

In the least, if CLONE_SIGHAND cannot be avoided, the library should block all signals that it does not use - that would direct the Linux kernel to send those signals to the main program, rather than to a library thread.

 

(note that if for any reason the library chooses to continue sharing signals with the main program, but only block them, there can still be a race just between the time that a library-thread is created and when it blocks unused signals - to prevent this race, the library should block all signals before calling CLONE, then unblock them in the parent/main thread, but not unblock them in the new library thread(s)).

 

Hope this is not too complicated, but it is really a bummer to have the OpenCL library which is supposed to be a "black-box" affect unrelated aspects of the calling program.

0 Likes
4 Replies
genaganna
Journeyman III

Mosix0,

          Could you please provide a test case to reproduce your problem?

0 Likes

Sorry for the delay - I had to construct a fully working OpenCL program for that (anything really, just add two vectors, this program doesn't even bother collecting the result).

 

Description: A son process sends the main program a signal, SIGUSR1 after one second.  Meanwhile the main process sets up and queues an OpenCL kernel, then blocks all signals while it performs some "work".  When the "work" is finished, it uses "sigsuspend" to receive the signal.  However, the signal is caught before that by some OpenCL library thread.

 

A good output should be:

 

Main thread is 12345 - Woke up thread 12345

 

Instead I get something like:

 

Main thread is 12345 - Woke up thread 12348

 

Here goes the program:

 

#include <cl.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

char *src = "__kernel void add(__global int *A, __global int *B, __global int *C, int n){int i = get_global_id(0);C = A + B;}";

int main_pid;

#define __NR_gettid 186

void
wakeup(int sig)
{
printf("Main thread is %d - Woke up thread %d\n", main_pid, syscall(__NR_gettid));
exit(0);
}

#define SIZE 60000

int
main(na, argv)
char *argv[];
{
cl_platform_id platform;
cl_context_properties cps[3];
cl_context context;
cl_command_queue queue;
cl_device_id device;
cl_program program;
cl_mem ma, mb, mc;
cl_kernel kernel;
cl_event event;
int n = SIZE;
int A[SIZE], B[SIZE], C[SIZE];
size_t sizes[3] = {SIZE, 0, 0};
int son;
cl_int res;
sigset_t all, none;
long work;

main_pid = getpid();
signal(SIGUSR1, wakeup);
if(!(son = fork()))
{
sleep(1);
kill(getppid(), SIGUSR1);
exit(0);
}
if(clGetPlatformIDs(1, &platform, NULL) != CL_SUCCESS)
{
fprintf(stderr, "Could not get a platform\n");
exit(1);
}
cps[0] = CL_CONTEXT_PLATFORM;
cps[1] = (cl_context_properties)platform;
cps[2] = 0;
if(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, NULL)
!= CL_SUCCESS)
{
fprintf(stderr, "No device\n");
exit(1);
}
if(!(context = clCreateContext(cps, 1, &device, NULL, NULL, NULL)))
{
fprintf(stderr, "No context\n");
exit(1);
}
if(!(queue = clCreateCommandQueue(context, device,
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, NULL)))
{
fprintf(stderr, "No queue\n");
exit(1);
}
if(!(program = clCreateProgramWithSource(context, 1,
(const char **)&src, NULL, NULL)))
{
fprintf(stderr, "No program\n");
exit(1);
}
if(clBuildProgram(program, 0, NULL, "", NULL, NULL) != CL_SUCCESS)
{
printf("Program not built\n");
exit(1);
}
if(!(kernel = clCreateKernel(program, "add", NULL)))
{
printf("No kernel\n");
exit(1);
}
if(!(ma = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR,
sizeof(A), A, NULL)) ||
!(mb = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR,
sizeof(B), B, NULL)) ||
!(mc = clCreateBuffer(context, CL_MEM_WRITE_ONLY|CL_MEM_USE_HOST_PTR,
sizeof(C), C, NULL)))
{
fprintf(stderr, "Failed creating buffers\n");
exit(1);
}
if(clSetKernelArg(kernel, 0, sizeof(ma), &ma) != CL_SUCCESS ||
clSetKernelArg(kernel, 1, sizeof(mb), &mb) != CL_SUCCESS ||
clSetKernelArg(kernel, 2, sizeof(mc), &mc) != CL_SUCCESS ||
clSetKernelArg(kernel, 3, sizeof(n), &n) != CL_SUCCESS)
fprintf(stderr, "Failed setting arg n\n");
if((res = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, sizes,
NULL, 0, NULL, &event)) != CL_SUCCESS)
{
fprintf(stderr, "Kernel not running, res=%d\n", res);
exit(1);
}
sigfillset(&all);
sigemptyset(&none);

sigprocmask(SIG_SETMASK, &all, NULL);
for(work = 0 ; work < 1000000000L ; work++)
;
sigsuspend(&none);
exit(0);
}
0 Likes

Mosix0,

           Thank you for reporting this issue with good testcase.

0 Likes

The release notes for SDK v2.01 claim:

 

* OpenCL runtime no longer catches signals sent by the application

 

However, I just downloaded v2.01 and this problem persists.

0 Likes