5 Replies Latest reply on May 16, 2012 9:52 AM by MicahVillmow

    atomic operation

    Anon5710

      Hello,

       

      I have written a kernel whose job is to compare  2 groups of integers to each other, whenever there is a match counter is increased.

      To avoid running in the 128 MB buffer limit, i decided to use atomic operations to give me a index, which i use as a reference where to write when counter is high enough.

       

      __kernel void compare( __global int* input, __global int* output, const unsigned int count, __local int * shared , __local int * suspect, __local int * indexes, __global int* outIndex )
      {
          int x = get_global_id(0); //get global idx
          int y = get_global_id(1); //get global idy
      
          int local_idx = get_local_id(0);
      
          int k,i,j;
          int counter=0;
          int index;
              
          if( x < count && y < count && x != y)
          {
              //store each subgroup (of hashes) in local memory 
              //offset each variable by workgroup size bytes
              for(k=0;k<REP*2;k++)
              {
                  shared[local_idx+k*64] = input[x+k*count]; 
                  suspect[k] = input[y+k*count];            //does the one exception to bank conflicts also work when writing ?
              }
              barrier(CLK_LOCAL_MEM_FENCE);
      
              //store outindex global values in local index values
              //has a coalsed acces pattern
              indexes[local_idx] = outIndex[x];
                      
              for(i=0;i<REP*2;i++)
              {
                  for(j=0;j<REP*2;j++)
                  {
                      if(suspect[i] == shared[local_idx+j*64])
                      {
                          //printf("Succes : %d = %d -> (%d,%d) \n",suspect[i],shared[local_idx+j*64],x,y);
                          counter++;
                      }
                  }
              }
      
              //output[x+y*count] = counter;
              if(counter >  REP/2)
              {
                  index = atomic_add( &indexes[local_idx], 2 );
                  output[ x +index*count ] = y;
                  output[ x +index*count+count ] = counter;
                  //if( x == 65 )
                      //printf("(%d,%d,%d)\n",y,counter,index);
              }
              barrier(CLK_LOCAL_MEM_FENCE);
      
              //restore outindex in global memory;
              outIndex[x] = indexes[local_idx];
              barrier(CLK_LOCAL_MEM_FENCE);
      
          }
      }
      

       

      Now when this code is run with line 45 & 46 uncommented, it works perfectly i have no problem at all. I get 20 or so positive results.

      When line 45&46 are commented out, i get return 1 result, the last one.  So I'm guessing there is a problem with the atomic operation. But honestly, i have no clue how to fix this.

       

      Does anyone know what i'm doing wrong ?

        • Re: atomic operation
          Anon5710

          Hello again,

          I've asked my professor but he couldn't answer this question as-well.

           

          i've attached my complete scourcode below.

          (make to compile, ./v1-par-test to execute. )

           

          Hope this helps

            • Re: atomic operation
              LeeHowes

              Aren't you performing atomics on a non-shared value (addressing local memory with the local index)? I'm not clear what purpose the atomic is serving unless I'm misreading the code.

              1 of 1 people found this helpful
                • Re: atomic operation
                  notzed

                  Ahh, so it wasn't just me.  It just seems to copy a globally unique value to a locally unique cache, do some locally unique 'atomic' ops on it, and then copy it back to the globally unique array.

                   

                  (and not to mention, actually writing any results to an always globally unique slice anyway).

                   

                  Atomics must be shared between jobs to be any use.

                   

                  If you want a tight-packed result you need to use a single global atomic - use atomic counters on amd, global atomics are really slow.  Or if you want one tight-packed-result per work-group, you could have one atomic counter per workgroup, or whatever - but since you need to reserve output space anyway it probably wouldn't really help achieve your goal of reducing memory wastage.

                  1 of 1 people found this helpful
                    • Re: atomic operation
                      Anon5710

                      So to quote from the opencl specification:

                       

                      Global memory is consistent across

                      work-items in a single work-group at a work-group barrier, but there are no guarantees of

                      memory consistency between different work-groups executing a kernel.

                       

                       

                      (this is what the above 2 persons were saying right ? )

                       

                      So how does using printf actually guarantees  memory consistency between diffrent work group at a global level ?

                      (because this is what i am experiecing ?)

                       

                      As a side note..

                      Am i forced to use a global counter, to get a tighter packed result ? ( i still can use count*count  array but this quickly grows way to .... )

                      second, i only have seen a global atomics example, can someone point me to a global counter example ?

                       

                       

                      Regards Anon5710

                        • Re: atomic operation
                          MicahVillmow

                          printf does not guarantee memory consistency. What you are seeing is a by-product of our implementation where we only run a certain number of work-groups at a time when printf is used because of the issue with the printf buffer size not being able to grow dynamically.