Archives Discussions

philipjfry · ‎05-10-2011

The OpenCL 1.0 and 1.1 specs state (Table 6.16):

All work-items in a work-group executing the kernel on a processor must execute this function before any are allowed to continue execution beyond the barrier. This function must be encountered by all work-items in a work-group executing the kernel.

Let's assume the following (pseudo-)kernel:

__kernel void f( unsigned height , [...] ) {

    if ( get_global_id( y ) >= height ) return;
    [ ... ]
    barrier( CLK_LOCAL_MEM_FENCE );
    [ ... ]
}

I find the term "executing the kernel" ambiguous in that context. For the given snippet, the question is if work-items are still "executing the kernel" after they have actually returned form the kernel (in that case this code is illegal) or if they are not executing the kernel anymore.

OpenCL API implementations (2.3 and 2.4) treat this case different - I know of implementations that accept and execute such code flawlessly, at least the AMD CPU implementation silently terminates kernel execution (took me some time to find out).

Is the AMD implementation only more restrictive in that case (the other implementatons supporting code that is not required to work) or is it too restrictive? Any ideas/comments?

MicahVillmow · ‎05-10-2011

The basics of the rule is this. If any thread in a work-group hits a barrier, then all threads must hit the barrier. To not do so results in undefined behavior.

philipjfry · ‎05-11-2011

Thanks alot for the clarification!

For all the other cases (loops and conditionals), the specs are quite clear. Only for this case we found that a more general interpretation was possible, but AMD deliberately sticks to the more restrictive one. IMHO, things would be more clear without that "...executing the kernel".

Jawed · ‎05-11-2011

The pseudo code you presented is a "conditional" case. There is an implicit "else" containing all the code after the return statement.

Archives Discussions

barrier and return-ing work-items