3 Replies Latest reply on Nov 7, 2014 4:43 AM by dipak

    intra thread communication with opencl

    thanasio

      Hi,

       

      i am coming to opencl with a cuda background and i was hoping to find a fast thread communication function similar to cuda's __shuffle(). I bumped into

       

      work_group_reduce() function available in opencl and i was wondering what is its performance when used to reduce values held in registers.



      Cheers,

      T

       

        • Re: intra thread communication with opencl
          sudarshan

          Hi,

          work_group_reduce is in OpenCL 2.0 and AMD stack for OpenCL 2.0 is not yet out. OpenCL 2.0 will have support for shuffle also.

           

          Do you mean performance compared to a hand coded kernel?

          1 of 1 people found this helpful
            • Re: intra thread communication with opencl
              thanasio

              With performance, i mean does this use any memory buffers for the reduction (ie. local or global), or just private memory?

              In regards to shuffle, do you mean that opencl 2, will have a similar instruction to CUDA's __shuffle() (exchange of private memory values in a wavefront), or you mean that it will support the openCL's shuffle, which looks like a convenience function for sorting numbers in a vector?

               

              Cheers,

              T

                • Re: intra thread communication with opencl
                  dipak

                  work_group_reduce function takes built-in data-types (e.g half, int, uint, long, ulong float or double) as input. It receives the input argument as value, not as a pointer. So, you can pass any value pointed by global or local memory pointer as follows.

                   

                  kernel void (global const int *inData, global int *outData)

                  {

                  uint  gid   = get_global_id(0);

                      ...

                  outData[gid ] = work_group_reduce_min(inData[gid]);

                  }

                   

                  In the above example, the "inData" or "outData" may be a local memory pointer.