AnsweredAssumed Answered

Partial Histogram code as given in OpenCL code samples on GoogleCode

Question asked by shreedhar_pawar on Aug 22, 2013
Latest reply on Aug 27, 2013 by shreedhar_pawar

//******************************************************************

// This kernel takes an RGBA 8-bit-per-channel input image and

// produces a partial histogram for R, G, and B. Each work-group

// represents an image tile and computes the histogram for that

// tile.

//

// partial_histogram is an array of num_groups * (256 * 3) entries.

// Each entry is a 32-bit unsigned integer value.

//

// We store 256 R bins, followed by 256 G bins, and then the 256

// B bins.

//******************************************************************

 

kernel void

histogram_partial_image_rgba_unorm8(image2d_t img,

global uint *histogram)

{

   int local_size = (int)get_local_size(0) *(int)get_local_size(1);

    int image_width = get_image_width(img);

    int image_height = get_image_height(img);

    int group_indx = (get_group_id(1) * get_num_groups(0)

                                               + get_group_id(0)) * 256 * 3;

 

    int x = get_global_id(0);

   int y = get_global_id(1);

 

   local uint tmp_histogram[256 * 3];

    int tid = get_local_id(1) * get_local_size(0)

                                       + get_local_id(0));

    int j = 256 * 3;

    int indx = 0;

 

// clear the local buffer that will generate the partial

// histogram

    do

       {

           if (tid < j)

               tmp_histogram[indx+tid] = 0;

           j -= local_size;

           indx += local_size;

        } while (j > 0);

barrier(CLK_LOCAL_MEM_FENCE);

 

          if ((x < image_width) && (y < image_height))

             {

                   float4 clr = read_imagef(img,

                                              CLK_NORMALIZED_COORDS_FALSE |

                                  CLK_ADDRESS_CLAMP_TO_EDGE |  CLK_FILTER_NEAREST, (float2)(x, y));

                                   

                   uchar indx_x, indx_y, indx_z;

                     indx_x = convert_uchar_sat(clr.x * 255.0f);

                     indx_y = convert_uchar_sat(clr.y * 255.0f);

                   indx_z = convert_uchar_sat(clr.z * 255.0f);

                  atomic_inc(&tmp_histogram[indx_x]);

                   atomic_inc(&tmp_histogram[256+(uint)indx_y]);

                  atomic_inc(&tmp_histogram[512+(uint)indx_z]);

            }

    barrier(CLK_LOCAL_MEM_FENCE);

 

      // copy the partial histogram to appropriate location in

     // histogram given by group_indx

 

         if (local_size >= (256 * 3))

           {

              if (tid < (256 * 3))

                 histogram[group_indx + tid] = tmp_histogram[tid];

           }

        else

         {

            j = 256 * 3;

          indx = 0;

           do

            {

               if (tid < j)

               histogram[group_indx + indx + tid] =

               tmp_histogram[indx + tid];

               j -= local_size;

              indx += local_size;

           } while (j > 0);

       }

}

 

In the highlighted part above where the image is read, how  is the start and end limit of x and y known as the group_indx change..? I ask this because they are defined as global ids at the start and there is a barrier at the end of that part of the code. Does this limit of x and y go till the end of the image size in the first go itself...? If this happens there is no partial histogram found actually ... as all the pixel values are incremented in the tmp_histogram in one go which will take a long time due to the atomics.

    But I know that this doesn't happen in one go. Firstly the tmp_histogram is incremented for first group, then for second group and so on....But then how does the values of x and y know to stop at the end of  local size, since they are global ids...?


Please answer this in terms of the example below, so that it's more clear.

 

Now suppose we consider an image of size 60X40 and we set local size to be 30X20, we have 4 groups, i.e. group_indx will go from 0 to 3....  tid will range from 0 to 599 for 0th group_indx, then how will x and y know that they should end till the local_size  since they are global_ids? or how will they know from where to start and till where to end when the group_indx is 1, 2 or 3...  ?

Outcomes