What does barrier do beyond what mem_fence already does?
From what I understand it looks like mem_fence allows the Kernel execution to continue beyond the mem_fence UNTIL it reaches a load/store operation... at which point it blocks for all pre-mem_fence work-group loads/stores to complete before continuing.
And barrier is even more "strict", as it blocks ALL execution (including but not limited to loads/stores) until all work-group kernels reach the barrier.
Is this correct?
Also, does write_mem_fence:
- Wait for all pre-mem_fence stores to complete before allowing future stores? OR
- Block on post-mem_fence stores until ALL pre-mem_fence operations have completed? OR
- Something else
Sorry for the onslaught of questions, and thanks for all the help so far.