Refer to OpenCL spec 1.1. You can use barriers but only within a workgroup.
You cannot use clEnqueueTask inside kernels.
Thanks!! Also one more question, does out of order execution is supported yet? and whether clenqueuetask works if independent kernels (equal to the number of cores in the machine) are queued to the machine? (will the utilization be 100%) without using device fission..
out of order queue is currently not supported on AMD implementation.
I don't understand why you are interested in clEnqueueTask when clEnqueueNDRangeKernel gives you more programmability. I have never used clEnqueueTask so cannot say anything for sure.
But as per spec:
clEnqueueTaskis equivalent to calling clEnqueueNDRangeKernel with
global_work_size set to 1, and
local_work_size set to 1."
So it should not be possible to run different tasks on different compute units of GPU. Also device fission is only there for CPUs so there you should be able to run many kernels.