I have a 32 x 4 work group, and each row of 32 of work items uses a block of 33 bytes of local memory.
So, I have allocated 33*4 == 132 bytes of local memory for each work group.
Is it possible to reduce the amount of local memory used: i.e. can each half-wavefront
use the same 33 bytes of local memory?