AnsweredAssumed Answered

Can local memory for a given work group can be used in a static way ?

Question asked by twintip31 on May 22, 2013

Hi,

 

I am trying to see (on my HD7950 GPU) how local memory on a same workgroup behaves accross kernel executions

 

Goal is to use local memories inside compute units to retain information accross kernel executions

 

Case 1: sharing info between work items on a same kernel run

==================================================

 

This kernel is executed on N work items in a same work group (N is set as low as possible to not go over work group ressources)

 

This example works and if I execute it 8 times, I get always the same results given an input data in a

 

__kernel void vector_add_gpu(__constant float *a,

                                           __global float *b,

                                           __local float *garbage)

{

    private int i = (int)get_global_id(0);

    garbage[i]=1.0*i;

    barrier(CLK_LOCAL_MEM_FENCE);

    b[i] = 4*a[i]+garbage[i+1];

}

 

=> b[i] is modified using gargabe[i+1] data initialized by i+1 th work item (working as per using the barrier)

 

Case 2: sharing info on different run of a same work item

=============================================

 

In this case, I modified the kernel source code so that garbage buffer is modified accross all executions of a same work item of index i

 

__kernel void vector_add_gpu(__constant float *a,

                                           __global float *b,

                                           __local float *garbage)

{

    private int i = (int)get_global_id(0);

    garbage[i]+=1.0*i;   ==> I am expecting that buffer is updated by 1.0 value at index i each time kernel is enqueued ....

    barrier(CLK_LOCAL_MEM_FENCE);

    b[i] = 4*a[i]+garbage[i+1];

}

 

=> b[i] shall be modified using gargabe[i+1] data initialized by i+1 th work item at all the M executions of the work item ....

 

Obviously Case 2 is not working and I get erratic results with random positions, I guess local HW memories cannot stay in same state after kernels are executed ??

Is there a way to force it ? Otherwise, what is the reason ?

Outcomes