cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

neko
Adept I

Cards supporting OpenCL 2.0 are executing kernels in random order while there's no CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE flag specified.

Hello,

we found a strange bug, where kernels are executed in wrong order even when command queue was not created with CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE flag. This affects only cards that supports OpenCL 2.0, see attached minimal example for demonstration. Expected output is:

Advanced Micro Devices, Inc., OpenCL 2.0 ...

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

OpenCL version 1.2, OK

while we are getting:

Advanced Micro Devices, Inc., OpenCL 2.0 ...

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

OpenCL version 1.2, FAIL (value[2] is 222 but it should be 333)

,which clearly shows that 1st kernel is executed after the 2nd one.

NOTE:

- executed kernels share the same program and name, differs only in parameter values

- changing build option at line 63 in attached example bug5.cpp from "-cl-std=CL1.2" to "-cl-std=CL2.0" seems to fix the problem, however we are using SPIR as obfuscation so this is not really an option since SPIR 2.0 is only in provisional stage and it's still not working properly

- moving local array zero[4] from line 85, outside of for loop fixes all kernel executions except first 2

- removing const from kernel.cl line 1, __global const int * const buffer => __global int * const buffer, also seems to help, sadly this works only for this simple example

- maybe another bug, if -cl-std= option is followed by junk i.e -cl-std=abc, not white space(s), it get's evaluated as CL2.0 instead of reporting error.

Confirmation and/or workaround suggestions or even better fix would be greatly appreciated.

0 Likes
1 Solution
neko
Adept I

This behavior is caused by AMD cards feature, starting from Southern Islands cards set, where independent kernels can be executed in parallel on different compute units. Kernel independence in this case is ensured by __global const int * const buffer declaration. Further reading can be found here: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_G... (section 1.5.4 Synchronization Caveats).

thanks Benjamin.


View solution in original post

0 Likes
2 Replies
neko
Adept I

This behavior is caused by AMD cards feature, starting from Southern Islands cards set, where independent kernels can be executed in parallel on different compute units. Kernel independence in this case is ensured by __global const int * const buffer declaration. Further reading can be found here: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_G... (section 1.5.4 Synchronization Caveats).

thanks Benjamin.


0 Likes

Besides, the only thing affected by OUT_OF_ORDER, or IN_ORDER is submission time. Execution and finish time, are not neccessarily in the order submitted.

0 Likes