Hi everybody,
I’m stuck with a small problem which is most probably due to my lack of knowledge of OpenCL. What I’m trying to do is to capture a colored image send it to my GPU transform it to grayscale and get it back. To do so I’m using buffers because it’s only an exercise. Because it’s most probably a logical mistake I’m writing down also my thinking. Here it is:
In order to read the picture I first use OpenCV:
IplImage* frame = cvLoadImage("D:\\Special Folders\\Pictures\\SamplePictures\\JellyfishS.jpg");
W = frame->width;
H = frame->height;
To access the image itself the IplImage Structure provides me with a char pointer “imageData”. I can also access the width and height of the image in pixels.
I create also an “empty image” with only one channel that will receive the result of the kernel.
IplImage * op_data = cvCreateImage(cvSize(frame->width,frame->height),frame->depth,1); //last argument is the number of channel
After creating the platform, context, queue, program and so on I create the buffers like that:
size_t mem_size = (size_t)frame->imageSize; //return the size of the image in bytes
size_t outputSize = (size_t)op_data->imageSize;
cl_mem data = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,mem_size, (void*)frame->imageData, &ciErrNum);
cl_mem output = clCreateBuffer(context,CL_MEM_WRITE_ONLY,outputSize,NULL,&ciErrNum);
My kernel is created like that:
cl_kernel mykernel = clCreateKernel (myprog , "luminosityGrayscale",&status);
I set the arguments:
status = clSetKernelArg(mykernel, 0, sizeof(cl_mem), &data);
status = clSetKernelArg(mykernel,1,sizeof(cl_mem), &output);
And finally I execute like that:
size_t globalws[2] = {W, H};
cl_event evt;
status = clEnqueueNDRangeKernel(queue, mykernel,2, 0, globalws, NULL,0, NULL, &evt);
I read back the data like that:
status = clEnqueueReadBuffer(queue, output, CL_TRUE,0, outputSize,
(void*) op_data->imageData , NULL, NULL, NULL);
Now because the image I manipulate is a 24 bits RBG image (so 8 bits per channel) and because OpenCV store images in sequence like this RBGRBGRBGRBG,I decided for my kernel to use a char3 for the imput argument thinking that like that I could manipulate easily each pixel and for the output argument a char since the resulting image should only have one channel (so only 8 bits per pixel). I also, as shown in the code above, created a workspace with the size of my image in pixels (which means one workItem per pixel – if I well understand the concept). So my kernel looks like that:
__kernel void luminosityGrayscale (__global uchar3 * data, __global uchar * output){
const int i = get_global_id(0);
const int j = get_global_id(1);
const int sizei = get_global_size(0);
int index = i+j*sizei;
uint R = (int)data[index].s0;
uint G = (int)data[index].s1;
uint B =(int)data[index].s2;
uint average = R*0.07+G*0.71+B*0.21;
output[index] = (uchar)average;
}
The result of this is wrong. I have a grayscale image alright, but it is kind of a duplicated (three of four times), superposed, shifted image.
My guess is that I screw up the index part in the kernel…But I can’t pinpoint the flaw. It’s also possible that I made a mistake somewhere else. So my question is: “Where my logic is flawed?”