From an Intel page on OpenCL:
Work-group functions, as the name implies, always operate in parallel over entire work-group. An implicit consequence from this fact is that any work-group function call acts as a barrier.
So, on AMD hardware, if I have a
do I also need a
call as well?
In my kernels, it looks like I do need this barrier.