Archives Discussions

twintip31 · ‎05-22-2013

Hi,

I am trying to see (on my HD7950 GPU) how local memory on a same workgroup behaves accross kernel executions

Goal is to use local memories inside compute units to retain information accross kernel executions

Case 1: sharing info between work items on a same kernel run

==================================================

This kernel is executed on N work items in a same work group (N is set as low as possible to not go over work group ressources)

This example works and if I execute it 8 times, I get always the same results given an input data in a

__kernel void vector_add_gpu(__constant float *a,

__global float *b,

__local float *garbage)

{

private int i = (int)get_global_id(0);

garbage=1.0*i;

barrier(CLK_LOCAL_MEM_FENCE);

b = 4*a+garbage[i+1];

}

=> b is modified using gargabe[i+1] data initialized by i+1 th work item (working as per using the barrier)

Case 2: sharing info on different run of a same work item

=============================================

In this case, I modified the kernel source code so that garbage buffer is modified accross all executions of a same work item of index i

__kernel void vector_add_gpu(__constant float *a,

__global float *b,

__local float *garbage)

{

private int i = (int)get_global_id(0);

garbage+=1.0*i; ==> I am expecting that buffer is updated by 1.0 value at index i each time kernel is enqueued ....

barrier(CLK_LOCAL_MEM_FENCE);

b = 4*a+garbage[i+1];

}

=> b shall be modified using gargabe[i+1] data initialized by i+1 th work item at all the M executions of the work item ....

Obviously Case 2 is not working and I get erratic results with random positions, I guess local HW memories cannot stay in same state after kernels are executed ??

Is there a way to force it ? Otherwise, what is the reason ?

himanshu_gautam · ‎05-23-2013

It is not guaranteed to persist local memory across kernel executions.

Any code written based on such assumpions will break.

himanshu_gautam · ‎05-23-2013

Local memory is not consistent among different kernel runs. This is specified in OpenCL Spec.6.5.2.

Also I do not expect case 1 to work properly either. Most-likely you are running it for a very small NDRange.

Only __global memory is consistent across multiple kernel runs, and even global memory will not be consistent among multiple workgroups when data modified by one thread is used by other thread (assuming the 2 threads are from different workgroups).

twintip31 · ‎05-23-2013

You said "Also I do not expect case 1 to work properly either. Most-likely you are running it for a very small NDRange" ==> Why ?

Here in case 1 the goal is that work item #N is using data from neighboor work-item #N-1 but that's all. I didnt try to assume local memory is persistent and I took a reduced number of work items so that I stay in the same work group, so that I can assume that the same local memory is used in the kernel execution

himanshu_gautam · ‎05-24-2013

I took a reduced number of work items so that I stay in the same work group, so that I can assume that the same local memory is used in the kernel execution

Yeah. That is the reason. I do not expect it to work, if you had more than 1 workgroup running for your kernel. In that case, the last work-item of workgroup X will need data created by first work-item of workgroup Y. That is not guaranteed to happen.

Archives Discussions

Can local memory for a given work group can be used in a static way ?