Is there any way to make local memory persistent across multiple kernel executions? I basically need that because I execute the same kernel with different values ( like a multi-pass approach )... and I'm currently accumulating a value using global memory... so I was wondering if could be possible to accumulate the value in local memory and move the final result in the last pass to global memory.
thx
Currently there is no way to ensure persistence in local memory.
Originally posted by: bubu Is there any way to make local memory persistent across multiple kernel executions? I basically need that because I execute the same kernel with different values ( like a multi-pass approach )... and I'm currently accumulating a value using global memory... so I was wondering if could be possible to accumulate the value in local memory and move the final result in the last pass to global memory.
thx
Is it just one kernel you keep calling?
Is it possible to use local memory and a loop inside the kernel? Have you tried this? If so, what kind of performance diff did you see between this approach and the multilple kernel/global approach?
Is it just one kernel you keep calling?
Yep, same kernel, just different args/constants.
For example:
void main ( const int pass, /* from 0 to 255 */
const float k,
__local float *loc,
__global float *outp )
{
const int l1d = get_local_id(1)*get_local_size(0)+get_local_id(0);
switch(pass)
{
case 0:
loc[l1d] = 0.0f;
break;
case 255: /* the last pass */
outp[get_global_id(1)*get_global_size(0)+get_global_id(0)] = loc[l1D];
break;
default:
loc[l1d] += k;
}
}
Host code:
cl_int i;
for ( i=0; i<255; ++i )
{
setKernelArg(kernel,0,sizeof(cl_int),&i);
if ( 0==pass ) setKernelArg(kernel,1,WORK_SIZE,NULL);
setKernelArg(kernel,sizeof(cl_mem), &myGlobalBuffer);
clEnqueueNDKernel...
}
You should add another flag to the clCreateBuffer function: CL_LOCAL_MEM.
Also modify the clSetKernelArg so it won't accept a NULL data pointer.
In that way we could use persistent local memory if we don't call the clSetKernelArg again with other data pointer.
Originally posted by: MicahVillmow Local memory is not guaranteed to be persistent by the hardware between kernel invocations, so this is not possible.
But perhaps for a future spec or HW?
There is 64KB of shared global memory though, right? Although currently this is not exposed.
Am I correct in saying that you are going to expose this via an OpenCL extension? If so, when is this likely to be?
Ah I see, so I suppose global shared memory is really designed for communication between different work groups of the same kernel call.
This still seems really useful -- any idea when it will be exposed?