6 Replies Latest reply on May 22, 2013 6:45 PM by twintip31

    Can local memory for a given work group can be used in a static way ?

    twintip31

      Hi,

       

      I am trying to see (on my HD7950 GPU) how local memory on a same workgroup behaves accross kernel executions

       

      Goal is to use local memories inside compute units to retain information accross kernel executions

       

      Case 1: sharing info between work items on a same kernel run

      ==================================================

       

      This kernel is executed on N work items in a same work group (N is set as low as possible to not go over work group ressources)

       

      This example works and if I execute it 8 times, I get always the same results given an input data in a

       

      __kernel void vector_add_gpu(__constant float *a,

                                                 __global float *b,

                                                 __local float *garbage)

      {

          private int i = (int)get_global_id(0);

          garbage[i]=1.0*i;

          barrier(CLK_LOCAL_MEM_FENCE);

          b[i] = 4*a[i]+garbage[i+1];

      }

       

      => b[i] is modified using gargabe[i+1] data initialized by i+1 th work item (working as per using the barrier)

       

      Case 2: sharing info on different run of a same work item

      =============================================

       

      In this case, I modified the kernel source code so that garbage buffer is modified accross all executions of a same work item of index i

       

      __kernel void vector_add_gpu(__constant float *a,

                                                 __global float *b,

                                                 __local float *garbage)

      {

          private int i = (int)get_global_id(0);

          garbage[i]+=1.0*i;   ==> I am expecting that buffer is updated by 1.0 value at index i each time kernel is enqueued ....

          barrier(CLK_LOCAL_MEM_FENCE);

          b[i] = 4*a[i]+garbage[i+1];

      }

       

      => b[i] shall be modified using gargabe[i+1] data initialized by i+1 th work item at all the M executions of the work item ....

       

      Obviously Case 2 is not working and I get erratic results with random positions, I guess local HW memories cannot stay in same state after kernels are executed ??

      Is there a way to force it ? Otherwise, what is the reason ?