Is optimal to declare local memory in this way inside a kernel to avoid passing local memory pointers as kernel args, pls? I have the impression that __local variable arrays are mapped to global memory
thx
__kernel void MyKernel ( __global float4 *result ) { __local int a[128], b; __local float c[384]; /* ... do some operations with a, b, c and output result */ } vs __kernel void MyKernel ( __global float4 *result, __local int *a, __local int *b, __local float *c ) { /* ... do some operations with a, b, c and output result */ }
local memory is mapped to global only on radeon 4xxx cards. but it s the same and only difference is that with static array you can get used local memory.
I have seen issues with your first example, but only when the local array was passed on to other kernels/functions. The latter solution however presented no problems.
Refer to openCL spec SEction 6.5.
Do you see any performance issues between these two methods?
Originally posted by: himanshu.gautam
Do you see any performance issues between these two methods?
I think so, but I need to perform more tests. Btw, I'm using a 5750 and cat.11.3.
I have seen issues with your first example, but only when the local array was passed on to other kernels/functions.
Well, I might want to do this:
int MyFuncI ( const int idx, __local int *lv ) { ... lv[idx] = idx&20 + (idx%3); .... } __kernel void MyKernel ( __global float4 *result ) { __local int a[128], b; __local float c[384]; MyFuncI ( 10, a ); .... result = (float4) ( ((float)a[10]) * 0.1f ); }
i use local array defined inside in kernel without issue. and passing it into another function. what can be issue is that local variables can be declared only in top most block of executed kernel.