3 Replies Latest reply on Aug 23, 2013 4:11 AM by gopal_hc

    issue with -g flag compiler option of clBuildProgram




      To debug my OpenCL kernel,

      I used "Debugging CPU Kernels with GDB" method as explained in OpenCL Programming Guide book by making two changes in the program

      1.  CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_CPU to ensure that program is executed on CPU.

      2.  compiled the kernel by setting compiler option parameter of clBuildProgram() as option="-g -O0".


      Debugging went successfully on CPU and i got correct result as well.

      After changing the device type from CPU to GPU, my result was still correct.

      But when i changed the compiler option=NULL and device type was even GPU, I got wrong result for fewer inputs. Again when I set the compiler option="-g -O0" and device type was GPU, i got the correct result for all the input.


      So I want to know,

      1.  what actually happens when OpenCL kernels are built with -g option?

      2.  Why I am getting wrong results for fewer inputs when i remove -g option from clBuildProgram?


      Thanks !!

        • Re: issue with -g flag compiler option of clBuildProgram

          Can you attach your kernel here?

          Also give details about GPU, Catalyst Driver, APP SDK version, OS, bitness.

            • Re: issue with -g flag compiler option of clBuildProgram

              the size of kernel code is very big so I would try to demonstrate it with a simple example such as given below:


              void d_memcpy1(uchar *dst, __global uchar *src, uint len)


                      int i = 0;

                      for(i = 0; i < len; i++)

                        dst[i] = src[i];  


              void d_memcpy2(__global uchar *dst, uchar *src, uint len)


                      int i;

                      for(i = 0; i < len; i++)

                        dst[i] = src[i];  



              __kernel void demoKernel(__global uchar *d_io_2d,

                                                                         __constant uint *d_common_input,

                                                                                int d_maxSize)


                    long index = get_global_id(0);

                    if(i >= d_maxSize)


                   uchar msg0[16], msg1[16];

                   //reading 32-byte data from global memory

                   d_memcpy1(msg0, &d_io_2d[(index*16)*2], 16], 16);   //reading first 16-byte consecutive data

                   d_memcpy1(msg1, &d_io_2d[((index*16)*2) + 16], 16);  //reading next 16-byte consecutive data


                   //doing some stuff on these 32-byte data


                   //writing back 32-byte data to same global memory

                   d_memcpy2(&d_io_2d[(index*16)*2], 16], msg0, 16);   //writing first 16-byte consecutive data

                   d_memcpy2(&d_io_2d[((index*16)*2) + 16], msg1, 16);  //writing next16-byte consecutive data



              and I am using machine with the following detail,

              GPU : Tahiti,

              Catalyst Driver : 9.0.2

              APP SDK version : OpenCL 1.2

              OS : Ubuntu

              bitness: 64-bit


              I hope this program would be helpful.

              For the first 16bytes my result was wrong, but for the next 16bytes result was correct when i changed compiler option="-g -O0" to option=NULL. Again when i set compiler option="-g -O0", i got correct result for both (first and next 16bytes) data.


              Note: In Nvidia Tesla K20m card, the same program is working correctly.