Archives Discussions

gopal_hc · ‎08-23-2013

Hi,

To debug my OpenCL kernel,

I used "Debugging CPU Kernels with GDB" method as explained in OpenCL Programming Guide book by making two changes in the program

1. CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_CPU to ensure that program is executed on CPU.

2. compiled the kernel by setting compiler option parameter of clBuildProgram() as option="-g -O0".

Debugging went successfully on CPU and i got correct result as well.

After changing the device type from CPU to GPU, my result was still correct.

But when i changed the compiler option=NULL and device type was even GPU, I got wrong result for fewer inputs. Again when I set the compiler option="-g -O0" and device type was GPU, i got the correct result for all the input.

So I want to know,

1. what actually happens when OpenCL kernels are built with -g option?

2. Why I am getting wrong results for fewer inputs when i remove -g option from clBuildProgram?

Thanks !!

himanshu_gautam · ‎08-23-2013

Can you attach your kernel here?

Also give details about GPU, Catalyst Driver, APP SDK version, OS, bitness.

gopal_hc · ‎08-23-2013

the size of kernel code is very big so I would try to demonstrate it with a simple example such as given below:

void d_memcpy1(uchar *dst, __global uchar *src, uint len)

{

int i = 0;

for(i = 0; i < len; i++)

dst = src;

}

void d_memcpy2(__global uchar *dst, uchar *src, uint len)

{

int i;

for(i = 0; i < len; i++)

dst = src;

}

__kernel void demoKernel(__global uchar *d_io_2d,

__constant uint *d_common_input,

int d_maxSize)

{

long index = get_global_id(0);

if(i >= d_maxSize)

return;

uchar msg0[16], msg1[16];

//reading 32-byte data from global memory

d_memcpy1(msg0, &d_io_2d[(index*16)*2], 16], 16); //reading first 16-byte consecutive data

d_memcpy1(msg1, &d_io_2d[((index*16)*2) + 16], 16); //reading next 16-byte consecutive data

//doing some stuff on these 32-byte data

//writing back 32-byte data to same global memory

d_memcpy2(&d_io_2d[(index*16)*2], 16], msg0, 16); //writing first 16-byte consecutive data

d_memcpy2(&d_io_2d[((index*16)*2) + 16], msg1, 16); //writing next16-byte consecutive data

}

and I am using machine with the following detail,

GPU : Tahiti,

Catalyst Driver : 9.0.2

APP SDK version : OpenCL 1.2

OS : Ubuntu

bitness: 64-bit

I hope this program would be helpful.

For the first 16bytes my result was wrong, but for the next 16bytes result was correct when i changed compiler option="-g -O0" to option=NULL. Again when i set compiler option="-g -O0", i got correct result for both (first and next 16bytes) data.

Note: In Nvidia Tesla K20m card, the same program is working correctly.

Archives Discussions

issue with -g flag compiler option of clBuildProgram