I have a work group of size 128. I can guarantee that for each of the four subsets of consecutive 32 work items,
that all items in the subset hit a local memory barrier. But, across subsets, this may not be the case.
Will this be a problem? Do all 128 items need to hit the barrier? What are the risks if not all them do?
Subset 0: 0,1,2....31
Subset 1: 32,33,....64
.
Subset 3: 96,97....128