Global memory for work item cache

Question asked by boxerab on Mar 9, 2017
Due to to local memory limitations, I need to use global memory as cache for my work items.

Suppose I have 1000 work groups with 64 work items each. Each item needs 4K cache. Cache doesn't need to persist after work item completes.

I will allocate one single global memory buffer and assign one chunk of size 4K to each work item.


(I am targeting AMD GPUs)


What is the minimum size I would need to guarantee that there would not be any concurrency issues between work items?

Since AMD has <= 64 CUs, my guess is

64 * 128 * 4000 bytes, and use (global work item ID % (64*128)) to assign a cache chunk to a work item.