I've been trying to work out how to reliably access local memory for each workgroup that I reserve through an argument parameter, but getting somewhat ambiguous results. I've tried searching for for articles relating to this, to no avail.
Working with a large data set, producing some 1944 bytes (486 uints) per work item, rounded to a 2048 byte boundary I'm looking to limit the number of work items per workgroup to 16 to prevent overflow.
I know that I have 32768 bytes available for each workgroup using 7970, which I can access without problem through workgroups 0 to 31. My question is, what happens when I have 2048 workgroups and how is the reserved memory addressed?
Believing that when I was addressing workgroup 32 (and 64, 96, 128, etc) that it would access the local memory in workgroup 0, ie. group_id & 31, I cannot seem to establish whether or not this is the case.
While I would only have 16 work items where I'm looking at referencing local memory by local_id << 11, it could be possible to use 256 work items, referencing local memory by (local_id & 15) << 11 and using atomic adds.
Any clarifications and insight to how I can best tackle this problem would be greatly appreciated.