I'm currently working on a application that use a "recursive" schema, something like
N(i) = f( N(i-1) )
So, this computation cannot be put in parallel, but in my context I can use several 'N', ie. I can start P threads each computing a sequence of N.
The goal is to minimize the number of N and I would like to know how many computation I can put in parallel ?
ie. what is the number of threads (work-item) I can launch together before reusing the same set of N ?