The OpenCL 1.0 and 1.1 specs state (Table 6.16):
All work-items in a work-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the barrier. This function must be encountered by all work-items in a work-group executing the kernel. |
Let's assume the following (pseudo-)kernel:
__kernel void f( unsigned height , [...] ) {
if ( get_global_id( y ) >= height ) return; [ ... ] barrier( CLK_LOCAL_MEM_FENCE ); [ ... ] } |
I find the term "executing the kernel" ambiguous in that context. For the given snippet, the question is if work-items are still "executing the kernel" after they have actually returned form the kernel (in that case this code is illegal) or if they are not executing the kernel anymore.
OpenCL API implementations (2.3 and 2.4) treat this case different - I know of implementations that accept and execute such code flawlessly, at least the AMD CPU implementation silently terminates kernel execution (took me some time to find out).
Is the AMD implementation only more restrictive in that case (the other implementatons supporting code that is not required to work) or is it too restrictive? Any ideas/comments?