1 of 1 people found this helpful
LDS belongs to a specific workgroup (1..3 wavefronts).
It is allocated right before the kernel is started. And deallocated after the kernel finishes.
The allocation's properties (memory offset inside that 64K memory, and size) is stored in hardware registers, those you can't access from opencl.
The hardware ensures that you can't write into unallocated areas.
Scoping: While using LDS this is only a C language feature to make your code prettier. The largest required LDS memory is allocated outside the kernel. It will be large enough to ensure all the scope's LDS needs.
LDS is a small amount of memory, refressing it from L2 cache is not a big deal. Unless your kernel consists only of a few instructions, that is comparable to few kilobytes of LDS initialization.
Thanks for your fast reply and clarifications.
I still have some questions, though.
I imagine that the __local declaration reserves avg in the initially allocated LDS. 64 items in my example use it. Do they get 64 different avg?
If there is only 1 common avg, who allocates it and how rest know where to find it?
Who can access the avg in my example?
What do you mean 'avg'? (I cannot think anything else than 'average' which is obviously wrong at the given context.)
By avg I mean the __local array in my initial post:
No special meaning, could have named it anything.
Oh indeed 'avg', your declaration.
So localSize=64, globalSize=64 -> Only on allocation (4K*4 bytes), workitem0..workitem63 accesses the same avg values.
localSize=128, globalSize=512 -> 4 allocations of avg[4k]:
wi0..wi127 -> first allocation
wi384..wi511 -> fourth allocation
total LDS allocation = 4k*4*4 = 64KB lds memory in total. A side note: 64K is the maximum that can be allocated on a single GCN ComputeUnit, so this amount is work fit well on 1 CU.
LDS is mainly for communicating between neighboring workitems. And it is also good for small lookup tables (as well as the L1 cache).
So, on first request, LDS mechanism allocates array and returns pointer to it. On subsequent requests of the same workgroup, it just returns the pointer to it, since it was already allocated.
To be precise, there are no subsequent requests.
Whenever a workgroup is assigned to a compute unit, the hardware will wait until enough LDS memori is present on that unit, and then allocates it. Launches the program on that particular workgroup and after it finishes, it releases the LDS memory.
So when your kernel starts, you have to assume that there is memory garbage in the local variables. No persistency at all. GDS is persistent, LDS is only for the life time of a workgroup.