cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jaidotsh
Staff

Parallel Models

Hi,

Is there a way I can efficiently parallelize a serial algorithm using OpenCL where the outer loop (say size M) has less iterations compared to inner loop (say size N). The algorithm which I'm dealing with currently has M<<N.

When I say efficiently, I mean by launching more threads (difficult, since the outer loop is small) or using more Instruction Level Parallelism

Thanks

0 Likes
1 Solution

  • yes
  • yes
  • that is not a problem. you just linearize address array[y*width+x]
  • you made typo for(M iterations)clEnqueueNDRange(global=N);

third option can be best because you get biggest global worksize. your problem must be totally parallel. if is there dependency on outer loop then best option is fourth.

View solution in original post

0 Likes
4 Replies
nou
Exemplar

you have four options. convert outer loop to NDRange with for loop in kernel. but that can lead to low utilization as you need >1000 global size to have good utlization.

second rewrite algoritm that you switch outer and inner loop and make it as first case. so you get enought global size.

third convert it to 2D NDRange so there is no loop in kernel.

and last execute M kernel invocation with N global size.

Please tell me if I got this right

Say, M=4 and N=256. In actuality there are more nests than this.

  • Launch NDRange with 4 threads. My kernel would look like

          for(N iterations){
           ....
     }

  • Change algorithm
  • Convert NDRange to 2D. This may not work because I'm using a 1D device arrays already.
  • Launch multiple NDRanges

     for(M iterations)

          NDRange(...global=N....)


 

0 Likes

  • yes
  • yes
  • that is not a problem. you just linearize address array[y*width+x]
  • you made typo for(M iterations)clEnqueueNDRange(global=N);

third option can be best because you get biggest global worksize. your problem must be totally parallel. if is there dependency on outer loop then best option is fourth.

0 Likes

Fixed!.

Yes, my problem is embarrassingly parallel. Third seems like a good idea. Will try that out. Thanks for the help

0 Likes