cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Partial Histogram code as given in OpenCL code samples on GoogleCode

//******************************************************************

// This kernel takes an RGBA 8-bit-per-channel input image and

// produces a partial histogram for R, G, and B. Each work-group

// represents an image tile and computes the histogram for that

// tile.

//

// partial_histogram is an array of num_groups * (256 * 3) entries.

// Each entry is a 32-bit unsigned integer value.

//

// We store 256 R bins, followed by 256 G bins, and then the 256

// B bins.

//******************************************************************

kernel void

histogram_partial_image_rgba_unorm8(image2d_t img,

global uint *histogram)

{

   int local_size = (int)get_local_size(0) *(int)get_local_size(1);

    int image_width = get_image_width(img);

    int image_height = get_image_height(img);

    int group_indx = (get_group_id(1) * get_num_groups(0)

                                               + get_group_id(0)) * 256 * 3;

    int x = get_global_id(0);

   int y = get_global_id(1);

   local uint tmp_histogram[256 * 3];

    int tid = get_local_id(1) * get_local_size(0)

                                       + get_local_id(0));

    int j = 256 * 3;

    int indx = 0;

// clear the local buffer that will generate the partial

// histogram

    do

       {

           if (tid < j)

               tmp_histogram[indx+tid] = 0;

           j -= local_size;

           indx += local_size;

        } while (j > 0);

barrier(CLK_LOCAL_MEM_FENCE);

          if ((x < image_width) && (y < image_height))

             {

                   float4 clr = read_imagef(img,

                                              CLK_NORMALIZED_COORDS_FALSE |

                                  CLK_ADDRESS_CLAMP_TO_EDGE |  CLK_FILTER_NEAREST, (float2)(x, y));

                                   

                   uchar indx_x, indx_y, indx_z;

                     indx_x = convert_uchar_sat(clr.x * 255.0f);

                     indx_y = convert_uchar_sat(clr.y * 255.0f);

                   indx_z = convert_uchar_sat(clr.z * 255.0f);

                  atomic_inc(&tmp_histogram[indx_x]);

                   atomic_inc(&tmp_histogram[256+(uint)indx_y]);

                  atomic_inc(&tmp_histogram[512+(uint)indx_z]);

            }

    barrier(CLK_LOCAL_MEM_FENCE);

      // copy the partial histogram to appropriate location in

     // histogram given by group_indx

         if (local_size >= (256 * 3))

           {

              if (tid < (256 * 3))

                 histogram[group_indx + tid] = tmp_histogram[tid];

           }

        else

         {

            j = 256 * 3;

          indx = 0;

           do

            {

               if (tid < j)

               histogram[group_indx + indx + tid] =

               tmp_histogram[indx + tid];

               j -= local_size;

              indx += local_size;

           } while (j > 0);

       }

}

In the highlighted part above where the image is read, how  is the start and end limit of x and y known as the group_indx change..? I ask this because they are defined as global ids at the start and there is a barrier at the end of that part of the code. Does this limit of x and y go till the end of the image size in the first go itself...? If this happens there is no partial histogram found actually ... as all the pixel values are incremented in the tmp_histogram in one go which will take a long time due to the atomics.

    But I know that this doesn't happen in one go. Firstly the tmp_histogram is incremented for first group, then for second group and so on....But then how does the values of x and y know to stop at the end of  local size, since they are global ids...?


Please answer this in terms of the example below, so that it's more clear.

Now suppose we consider an image of size 60X40 and we set local size to be 30X20, we have 4 groups, i.e. group_indx will go from 0 to 3....  tid will range from 0 to 599 for 0th group_indx, then how will x and y know that they should end till the local_size  since they are global_ids? or how will they know from where to start and till where to end when the group_indx is 1, 2 or 3...  ?

0 Likes
1 Solution
himanshu_gautam
Grandmaster

Hi

Ofcourse this is a very nice question, most of the beginners will have this doubt.

Note : global id is an unique id or number for a work item , where as local id is local to the work group.

here in the above code the x and y  are checked against the image width and height, its proper. This x and y will get the global ids for the the work items with in a work group , it wont cross its work group size.

For ex: Assume the work group size is 10 and the number of pixels of an image is 100(1D). Now there are 10 work groups.

when the 1st work group is executing the kernel, it will get the tids ranging from 0 to 9 (global ids) , it wont take 10 as its value.

In this case the programer can directly get the image pixel value just by giving global id... Alternatively you can do the same using local id but you should convert the pixel position properly to get the exact pixel position in the image.

I hope now you can co-relate your example with the above.

Let me know in case if you are still not clear...


View solution in original post

0 Likes
5 Replies
himanshu_gautam
Grandmaster

Hi

Ofcourse this is a very nice question, most of the beginners will have this doubt.

Note : global id is an unique id or number for a work item , where as local id is local to the work group.

here in the above code the x and y  are checked against the image width and height, its proper. This x and y will get the global ids for the the work items with in a work group , it wont cross its work group size.

For ex: Assume the work group size is 10 and the number of pixels of an image is 100(1D). Now there are 10 work groups.

when the 1st work group is executing the kernel, it will get the tids ranging from 0 to 9 (global ids) , it wont take 10 as its value.

In this case the programer can directly get the image pixel value just by giving global id... Alternatively you can do the same using local id but you should convert the pixel position properly to get the exact pixel position in the image.

I hope now you can co-relate your example with the above.

Let me know in case if you are still not clear...


0 Likes

Hey Himanshu,

  I understand what you say and its also clear for a 2-D image. But I have a doubt that, does the sequence of defining tid, x, y and group_indx matter in any case...? If it does, how will it affect the execution of every work group, please give an example with a wrong sequence...!

One more doubt... n this is really important...

Even if x and y are working independently for each work group, is this happening in a parallel way...? i.e. for every partial histogram we have a tmp_histogram, but the size of it is just for one partial histogram, hence my actual doubt is that, for all the partial histograms, are there those many(as many as the no. of groups)  tmp_histograms allocated in the memory simultaneously( i.e. one tmp_histogram allocation for every work_group, in my case for 4 work_groups, 4 tmp_histogram allocated simultaneously) and is the processing on them going on in parallel...? Or is it that, at a time there is just one tmp_histogram to store a partial histogram which is initialized again when the second one is allocated...???

0 Likes

Hey Himanshu,

  I understand what you say and its also clear for a 2-D image. But I have a doubt that, does the sequence of defining tid, x, y and group_indx matter in any case...? If it does, how will it affect the execution of every work group, please give an example with a wrong sequence...!

One more doubt... n this is really important...

Even if x and y are working independently for each work group, is this happening in a parallel way...? i.e. for every partial histogram we have a tmp_histogram, but the size of it is just for one partial histogram, hence my actual doubt is that, for all the partial histograms, are there those many(as many as the no. of groups)  tmp_histograms allocated in the memory simultaneously( i.e. one tmp_histogram allocation for every work_group, in my case for 4 work_groups, 4 tmp_histogram allocated simultaneously) and is the processing on them going on in parallel...? Or is it that, at a time there is just one tmp_histogram to store a partial histogram which is initialized again when the second one is allocated...???

0 Likes


Yes , defining tid, x, y and grup indx plays vital role in accessing the exact pixel value and update the value in the tmp_histogram array.

You can check this by applying some numbers as an example... calculate manually.... especially while updating the histogram array which this function received as global.

As of my understanding this function is doing like below.

1. Getting global ids to access the pixel values

2. creating tmp_historam of size 256 *3 and update its value base on pixel value

3. global histogram which is received as argument is of size num_of_workgroup * 256 * 3.

4. all the workgroups should update this array properly using grup index and tid

I can not actually give you wrong sequence where it can go wrong....

All the workgroups will work parallelly . Each work group will be having its own tmp_histogram of size 256 * 3

like if you have 4 wgs, there will be 4 tmp_histogram. it is just local to one wg so that all the threads will update its value parallelly.

Hey Himanshu ,

  Firstly thanks to you for clearing my second doubt, but on my first doubt , I understand that tid needs to be defined after x and y, since it's deciding the local limits, but  I am really not clear whether what will happen if I define group_indx, just  after I define x and y..? This is in contrast with the original code where they have defined group_indx  before defining x and y...! Will this affect the way code is being executed...?

0 Likes