cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gopal_hc
Journeyman III

issue with -g flag compiler option of clBuildProgram

Hi,

To debug my OpenCL kernel,

I used "Debugging CPU Kernels with GDB" method as explained in OpenCL Programming Guide book by making two changes in the program

1.  CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_CPU to ensure that program is executed on CPU.

2.  compiled the kernel by setting compiler option parameter of clBuildProgram() as option="-g -O0".

Debugging went successfully on CPU and i got correct result as well.

After changing the device type from CPU to GPU, my result was still correct.

But when i changed the compiler option=NULL and device type was even GPU, I got wrong result for fewer inputs. Again when I set the compiler option="-g -O0" and device type was GPU, i got the correct result for all the input.

So I want to know,

1.  what actually happens when OpenCL kernels are built with -g option?

2.  Why I am getting wrong results for fewer inputs when i remove -g option from clBuildProgram?

Thanks !!

Tags (3)
0 Likes
2 Replies
himanshu_gautam
Grandmaster

Re: issue with -g flag compiler option of clBuildProgram

Can you attach your kernel here?

Also give details about GPU, Catalyst Driver, APP SDK version, OS, bitness.

0 Likes
gopal_hc
Journeyman III

Re: issue with -g flag compiler option of clBuildProgram

the size of kernel code is very big so I would try to demonstrate it with a simple example such as given below:

void d_memcpy1(uchar *dst, __global uchar *src, uint len)

{

        int i = 0;

        for(i = 0; i < len; i++)

          dst = src;  

}

void d_memcpy2(__global uchar *dst, uchar *src, uint len)

{

        int i;

        for(i = 0; i < len; i++)

          dst = src;  

}

__kernel void demoKernel(__global uchar *d_io_2d,

                                                           __constant uint *d_common_input,

                                                                  int d_maxSize)

{

      long index = get_global_id(0);

      if(i >= d_maxSize)

           return;

     uchar msg0[16], msg1[16];

     //reading 32-byte data from global memory

     d_memcpy1(msg0, &d_io_2d[(index*16)*2], 16], 16);   //reading first 16-byte consecutive data

     d_memcpy1(msg1, &d_io_2d[((index*16)*2) + 16], 16);  //reading next 16-byte consecutive data

     //doing some stuff on these 32-byte data

     //writing back 32-byte data to same global memory

     d_memcpy2(&d_io_2d[(index*16)*2], 16], msg0, 16);   //writing first 16-byte consecutive data

     d_memcpy2(&d_io_2d[((index*16)*2) + 16], msg1, 16);  //writing next16-byte consecutive data

}

and I am using machine with the following detail,

GPU : Tahiti,

Catalyst Driver : 9.0.2

APP SDK version : OpenCL 1.2

OS : Ubuntu

bitness: 64-bit

I hope this program would be helpful.

For the first 16bytes my result was wrong, but for the next 16bytes result was correct when i changed compiler option="-g -O0" to option=NULL. Again when i set compiler option="-g -O0", i got correct result for both (first and next 16bytes) data.

Note: In Nvidia Tesla K20m card, the same program is working correctly.

0 Likes