5 Replies Latest reply on Apr 12, 2013 12:19 AM by himanshu.gautam

    constant memory issue

    roger

      Hi everyone,

       

      i already posted this on Khronos forums since i'm currently working on a NVIDIA card ( soon i'll get a hd Radeon 7950 ), but i need to resolve this  and nvidia removed their opencl forums....

       

      here is my problem :

       

      My code work fine if a int buffer ( 2 elements ) is in global memory (__global) but when i swap it to constant memory (__constant) the result become completely wrong.

       

      The weird stuff is, i tried not to use this buffer in the kernel and the result is still wrong, with __global it works fine.

       

      i dont really get it, why this space qualifier is blowing up my code even if i dont use the buffer.

       

      here is the kernel signature PTX code :

       

      .entry func(
         .param .align 4 .b8 func_param_0[52],
         .param .u32 func_param_1,
         .param .u32 func_param_2,
         .param .u32 .ptr .global .align 4 func_param_3,
         .param .u32 .ptr .global .align 64 func_param_4,
         .param .u32 .ptr .global .align 32 func_param_5,
         .param .u32 .ptr .global .align 16 func_param_6,
         .param .u32 .ptr .global .align 4 func_param_7,
         .param .u32 .ptr .const .align 4 func_param_8,
         .param .u32 .ptr .global .align 16 func_param_9,
         .param .u32 .ptr .global .align 16 func_param_10,
         .param .u32 .ptr .global .align 16 func_param_11,
         .param .u32 .ptr .global .align 4 func_param_12,
         .param .u32 .ptr .global .align 4 func_param_13,
         .param .u32 .ptr .global .align 4 func_param_14,
         .param .u32 .ptr .global .align 1 func_param_15,
         .param .u32 .ptr .global .align 4 func_param_16,
         .param .u32 .ptr .global .align 4 func_param_17,
         .param .u32 .ptr .global .align 16 func_param_18,
         .param .u32 .ptr .global .align 4 func_param_19,
         .param .u32 .ptr .global .align 16 func_param_20,
         .param .u32 .ptr .global .align 4 func_param_21,
         .param .u32 .ptr .global .align 16 func_param_22,
         .param .u32 .ptr .global .align 16 func_param_23
      )

       

       

      func_param_8 is const and mess completely the code even without using it.

       

      if anyone has an idea ? i am using a gForce 560Ti.

       

      ty

       

      Roger512

        • Re: constant memory issue
          himanshu.gautam

          I hope you are doing "clSetKernelArg()" in the right sequence......When you remove the argument, you need to revisit the "clSetKernelArg" code snippet to see if you are setting the arguments right.

          Also, Are you checking for errors in kernel launch?

            • Re: constant memory issue
              roger

              there is no error at launch or execution, the kernel compute images.When i switch to constant the image is incorrect and the computing time rises from 1.3 ms to 145 ms,it's like having constant mess completely other pointers.

               

              Nothing seems wrong with clSetKernelArg sequence, i ll wait to receive my new ATI cards to see if i can reproduce the bug.

                • Re: constant memory issue
                  himanshu.gautam

                  Hmm....Strange...

                   

                  What is even more strange is that the problem occurs even if your kernel does not access the "constant" memory.....

                   

                  According to OpenCL, __constant is just an area on global memory but is "read-only" to the Kernel. Any write to it, will result in compilation error.

                  I am not sure how NVIDIA has implemented this. If they are using the "constant memory" on NVIDIA GPU instead of global memory then the kernels will suffer huge performance loss if every workitem is accessing different constants. According to NVIDIA's constant memory access guidelines, all threads in the half-warp need to access the same constant memory address. If not, accesses will be serialized. This manifests itself as "serializations" in the profiler output.

                   

                  But since you say, your kernel does not really access the constant data -- I really dont have anything to say about this.

                  Please post back after you check with your AMD cards.....

                   

                  Good Luck!