cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Combination of Task Parallelism and Data Parallelism.

    In my Image Segmentation code, I have divided the image into  4 parts(ie if there are 4000 pixels, each part is of 1000 pixels). I have 8 kernels in my code... first 4 of which are to be executed in parallel and the next four again in parallel but after the first four kernels get executed. Is this possible if I use same command queue for all 8 kernels and  specify a  clEnqueueNDRangekernel command for each of the first  four kernels and I mention OUT_OF_ORDER argument while creating the command queue...? And if this is possible how to execute the next four kernels in parallel, that are to executed after first four kernels..? Can I give a clWaitForEvents command after the first four kernels and then specify the next four kernels..? will this guarantee that the first four kernels are executed in parallel and the next four are executed after them but in parallel..? 

    I think clEnqueueTask would make my code slow, since I have about 1000 pixels in each kernel and clEnqueueTask allows gobal_workitem_size and the local_work_item_size to be just 1....!

   I am not sure whether all these things can be done...and what is wrong or right... so I just need a confirmation...!  But if not in this way please suggest an alternative way...!

1 Solution

if you can execute task with single kernel then best option is do so. leave task parallelization on OpenCL and don't bother with it. each kernel launch bring little start overhead so four kernel have higher overhead than single one. what lead you to believe that you idea can bring any speed up?

View solution in original post

12 Replies