Hi I am new to OpenCL, I just wanted to know why task parallelism is restricted to a single thread?
Actually clEnqueueTask is a simpler form of clEnqueueNDRangeKernel for enqueuing a kernel only for a single work item or thread. It is equivalent to calling clEnqueueNDRangeKernel with work_dim = 1, global_work_offset = NULL, global_work_size set to 1, and local_work_size set to 1. So, there is no restriction, its just an another form.
FYI. clEnqueueTask is deprecated in OpenCL 2.0.
Thanks for the reply Dipak.
I understand about clEnqueueTask, however my question is got to do with task parallelism.
Can we run 2 or more different kernels(with more than one work item) simultaneously on different compute units?
Yes, you can if the tasks are independent. However, you should use multiple command queues to enqueue multiple kernels/tasks to the device. Its specially effective on the cards which have the hardware support for handling multiple queues and commands simultaneously.
Otherwise, if you push many commands into a single in-order queue, you'll not get the concurrency you want. Runtime/driver may serialize the commands though they may be independent of each other. Yes, there is out-of-order queue supported by the OpenCL, but they are not a mandatory feature (for host-side queue) and an implementation may ignore the out-of-order execution. So, in that case, you'll not see any performance difference from an in-order queue.
Another point, task is nothing but a conceptual thing. Tasks are described same as any other kernels. So, they can be launched for a single or multiple work-item(s) as required by the application. For launching more than one work-items, clEnqueueNDRangeKernel is the only choice.
Retrieving data ...