From an Intel page on OpenCL:
Work-group functions, as the name implies, always operate in parallel over entire work-group. An implicit consequence from this fact is that any work-group function call acts as a barrier.
So, on AMD hardware, if I have a
do I also need a
call as well?
In my kernels, it looks like I do need this barrier.
The spec does not require a barrier for these work-group functions, but does require they be executed by all work-items just as barriers must be. It depends on the implementation (which may change in future). Though barrier may be used to implement these functions on certain platforms, however, one should not write code expecting that an actual barrier will be executed inside any of these functions. If your code needs a barrier, better to use it explicitly.
Thanks, Dipak. I guess my question is this: Do I need to place a barrier after a work group function, to ensure that the returned result is correct
i.e. that all work items have executed the work group function? My assumption is that I don't need a barrier to ensure correct result, but my experience was
that I did need to add a barrier, or the result was not correct.
I don't think that you need to put a barrier to get the expected result from a work-group function. For example, in case of work_group_all, it returns a non-zero value if the predicate evaluates to non-zero for all work-items in the work-group. Here, no need to place a barrier to ensure that all work-items in the WG are evaluated correctly. That is done by in-built function itself. As a programmer, one just needs to ensure that all the work-items in the WG must be encountered by the built-in function. If you observe a different scenario where a barrier is needed to ensure the correctness of a work-group function, please share the repro code so we can investigate it.
Thanks, Dipak. I will check again, but I was seeing artifacts in my program output unless I added my own barrier.
Could be my own bug, though - will confirm.