Parallel Models

Question asked by jaidotsh on Nov 23, 2012
Is there a way I can efficiently parallelize a serial algorithm using OpenCL where the outer loop (say size M) has less iterations compared to inner loop (say size N). The algorithm which I'm dealing with currently has M<<N.


When I say efficiently, I mean by launching more threads (difficult, since the outer loop is small) or using more Instruction Level Parallelism