Let's assume we have a 1D kernel with 64 global & local work size (1 workgroup). It goes like this:
__kernel void test
__local float avg;
avg could be allocated by each item in the group. That would result in 64 different allocations, easily overrunning the 32 K LDS limit. Furthermore wouldn't be common to the group.
So it's not allocated by the group. Who allocates it? If only first access, item does, how do the rest of items know where in LDS avg is?
In C, this scoping is local and avg would be available only to this item and only within the test scope. That is not the case in ocl, though, if it is shared by the group.
Unfortunately CodeXL won't display __local variables, and printf is very problematic form within the kernel:-(
Does each item have its own stack? How is it implemented, SGPRS?