I am trying to run 5 different kernels on a 24 core cpu machine (24 compute units). I am using clEnqueueTask() to queue all the kernels to the cpu. This should theoritically use 5 cores on the cpu but uses only 1 core all the time. Interestingly if i am using different queues then the usage goes up to 2-3 cores but not the maximum.
I would like to know if anybody as used task parallelism in opencl kindly throw some light to my problem...