I have a 32 x 4 work group, and each row of 32 of work items uses a block of 33 bytes of local memory.
So, I have allocated 33*4 == 132 bytes of local memory for each work group.
Is it possible to reduce the amount of local memory used: i.e. can each half-wavefront
use the same 33 bytes of local memory?
Of course not. If you use it, you use it. How do you expect to get a meaningful answer without even explaining what you're doing?
For the purpose of memory allocation, consider all 64WIs in a WF executing concurrently.