Dear all,
I've got a problem with a kernel, which should expand an 8bit image (gray) into a 24bit rGB image. The kernel produces garbage, but works fine on Nvidia machines:
__kernel void
copy_pixels_bwk(__global unsigned char* input, __global unsigned char* output, const unsigned int width, const unsigned int height,
const int off_bitmap, const unsigned int output_elements, const int dorgb, const unsigned char ucalpha, const int use_pseudo_lut, __global unsigned char* PSEUDO_LUT)
{
unsigned int xIndex = get_global_id(0);
unsigned int yIndex = get_global_id(1);
unsigned int in_val = 0.0;
unsigned int index_in = mul24(width, yIndex) + xIndex;
unsigned int index_out = output_elements * width * yIndex + output_elements * xIndex;
if (xIndex < width && yIndex < height)
{
in_val = input[index_in];
output[index_out++] = (unsigned char)in_val;
output[index_out++] = (unsigned char)in_val;
output[index_out] = (unsigned char)in_val;
}
}
When I set in_val = xIndex or yIndex there is a fine ramp up over the image. So output is ok. Also in_val = index_out or index_in gets a proper ramp.
Problem looks like it is not possible to get input properly, but I do not know why.
Host code (part):
copy_pixels_bw_kernel.setArg(0, m_OCLBufferBW8);
copy_pixels_bw_kernel.setArg(1, m_OCLBufferRGB24);
copy_pixels_bw_kernel.setArg(2, width);
copy_pixels_bw_kernel.setArg(3, height);
copy_pixels_bw_kernel.setArg(4, m_iOff_width_bitmap);
copy_pixels_bw_kernel.setArg(5, output_elements);
copy_pixels_bw_kernel.setArg(6, dorgb);
copy_pixels_bw_kernel.setArg(7, ucalpha);
copy_pixels_bw_kernel.setArg(8, use_pseudo_lut);
copy_pixels_bw_kernel.setArg(9, m_OCLBufferPseudoLut);
m_OCLQueue.enqueueNDRangeKernel(copy_pixels_bw_kernel, cl::NullRange, cl::NDRange(width, height), cl::NullRange);
m_OCLQueue.finish();
if (output != NULL)
{
NVTX nvtx2("Read copy_pixels_bw_kernel");
m_OCLQueue.enqueueReadBuffer(m_OCLBufferRGB24, CL_TRUE, 0, (m_iOff_width_bitmap + width) * height * output_elements, output);
}
m_OCLQueue.finish();
Buffers in host code are fine.
As this code works on Nvidia GPU and after tons of checks I give up here. Probably someone of you might help.