12 Replies Latest reply on Apr 1, 2013 8:50 PM by himanshu.gautam

    Combination of Task Parallelism and Data Parallelism.


          In my Image Segmentation code, I have divided the image into  4 parts(ie if there are 4000 pixels, each part is of 1000 pixels). I have 8 kernels in my code... first 4 of which are to be executed in parallel and the next four again in parallel but after the first four kernels get executed. Is this possible if I use same command queue for all 8 kernels and  specify a  clEnqueueNDRangekernel command for each of the first  four kernels and I mention OUT_OF_ORDER argument while creating the command queue...? And if this is possible how to execute the next four kernels in parallel, that are to executed after first four kernels..? Can I give a clWaitForEvents command after the first four kernels and then specify the next four kernels..? will this guarantee that the first four kernels are executed in parallel and the next four are executed after them but in parallel..? 

          I think clEnqueueTask would make my code slow, since I have about 1000 pixels in each kernel and clEnqueueTask allows gobal_workitem_size and the local_work_item_size to be just 1....!

         I am not sure whether all these things can be done...and what is wrong or right... so I just need a confirmation...!  But if not in this way please suggest an alternative way...!

        • Re: Combination of Task Parallelism and Data Parallelism.

          OUT_OF_ORDER has no effect on AMD platforms.

          On NVIDIA Platforms -- I think only Kepler cards support simultaneous kernel execution. But knowing NVIDIA and their love toward OpenCL, I am not too sure if they had implemented out of order processing, multiple kernel execution etc.. in OpenCL.


          At least in NVIDIA platforms, I know that for simultaneous processing of multiple kernels/data transfers etc.. You need to use multiple queues. So enqueue 4 independent kernels in 4 dfferent queues.

          Enqueue the next set after you finish all these (clFinish() on a each command queue)

          (or) Construct an event list of all 4 kernel events and make use of it in "clEnqueue()" for the rest 4 kernels.


          Hope this helps.

          1 of 1 people found this helpful