Hello
I am new to OpenCL and I am trying guassian filter on a simple matrix of ints. I have a matrix 1000 x 1000 and I need to run a separable filter first in x-direction and then in y-direction. For width-wise filtering, I made;
size_t global_threads[2] = {1000, 1000};
size_t local_threads[1] = {1000};
clEnqueueNDRangeKernel(command_queue, row_kernel, 2, NULL, global_threads, local_threads, 0, NULL, NULL);
And inside the kernel, I declare
int lid = get_local_id(0);
int IDy = get_global_id(1);
__local int localSrc[1000];
localSrc[lid] = Src[IDy*1000 + lid];
So what I want is that for every work item in a work-group (which consist of the whole row, and there are 1000 rows=work-groups), there whole row data is copied to local memory of that work group, so when filtering runs over any work-item in the whole 1000x1000 matrix, each work item will read its surrounding row elements from the local memory so it will be faster and would avoid race condition in otherwise reading from global space.
But my clEnqueueNDRangeKernel(---) as above fails with return error code -54, what this means?
Also I want to know if my setting of global_threads and local_threads sizes are correct when I want to copy rows to same local memory for efficiency?
Thanks in advance