10 Replies Latest reply on Sep 26, 2013 12:10 PM by shunyo

    Using local memory for compact kernel OpenCL

    shunyo

      I am trying to implement compact operation in OpenCL. I also want to get the number of keys which fall in a particular bin. I am using atomic_inc. But I read that the global memory is very expensive for atomic_inc and that local memory can be used for it. Is there a way to use local memory better to update the keys?

       

      __kernel void compact(__global int* inCode, __global int* isValid, __global int* Scan, __global int* outCode,__global int* numPoints, __global int* index) 
      {
        int ig = get_global_id(0);
        int m = isvalid[ig];
        int j=  Scan[ig];
        int code = inCode[ig];
        if(m == true)
        {
      
             outCode[j] = code;
             index[j] = ig;
        }
        int num = numPoints[j-1];
        atomic_inc(&num);
      }
      
        • Re: Using local memory for compact kernel OpenCL
          himanshu.gautam

          Actually you are not using local memory in the above code... if you declare any varialbe without any qualifiers like _global or _local , then it will be a private varialbe which is private to the thread.

           

          Please refer opencl spec 1.2  section 6.5 Address space Qualifiers.

            • Re: Using local memory for compact kernel OpenCL
              shunyo

              Yes, I am not using any local variables due to the fact that I am not sure as to how to introduce local variables in this code. For every parallel run of kernel, there is an atomic_inc updating the value. Can you guide me on the correct way of introducing local variables in the above code?

                • Re: Using local memory for compact kernel OpenCL
                  himanshu.gautam

                  could you please brief your code logic a bit... so that i can help you in this regarding.. its looking like histogra problem

                    • Re: Using local memory for compact kernel OpenCL
                      shunyo

                      So what I need to do is to keep the unique keys along with the frequency of each key and the starting index of the unique key in the inCode array.

                      inCode - the input of the keys

                      isValid - a boolean variable which decides if the key is unique

                      Scan - the prefix scan sum of the final place of the unique keys.

                      outCode - output of unique keys

                      numPoints - frequency of each unique keys

                      index - output of starting index of the unique key in the inCode.

                      So I am checking for the true value in isValid and inputting into the outCode and the index. The frequency of the key is going to increment every time.

                      Hopefully this helps.

                        • Re: Using local memory for compact kernel OpenCL
                          himanshu.gautam

                          Hi

                           

                          I understood what you are trying to do.. but really not able to understand with an example. Its better if you an example and explain. I am not able to assume scan array.

                            • Re: Using local memory for compact kernel OpenCL
                              shunyo

                              I am trying to code Octree on GPU. So if a set of keys, the valid elements and the Scan arrays are as:

                              #Keys    #isValid  #Scan

                              1000          1          0

                              1000          0          1

                              1001          1          1

                              1001          0          2

                              1001          0          2

                              1002          1          2

                              1002          0          3

                              1003          1          3

                              1004          1          4

                              1004          0          5

                              1004          0          5

                              1004          0          5

                               

                              So by using the compact kernel, only the unique keys are kept in the positions given by the Scan arrays along with the starting index and the frequency

                              #uniqueKeys      #numPoints         #index

                                1000                         2                    0

                                1001                         3                    2

                                1002                         2                    5

                                1003                         1                    7

                                1004                         4                    8

                               

                              himanshu.gautam does this help?

                                • Re: Using local memory for compact kernel OpenCL
                                  himanshu.gautam


                                  For this data set of Scan the above code wont work. Because when j = 0 you will endup in accessing numPoints[-1]. This is not possible.

                                  Whether your code is working  fine?

                                   

                                  One suggestion here i would like to give is  you can make numPoints as local variable and update the values of each wg to the global memory at the end of the kernel. Keep the barrier before updating the global memory so that all the thread would have updated their values in the local array. Also to create local array you must know the size of it. So you should pass it as build option or you can even hardcode it. Rest everything can be global because you will be accessing it only once.

                                  1 of 1 people found this helpful