Hi,

I'm currently working on a application that use a "recursive" schema, something like

N(i) = f( N(i-1) )

So, this computation cannot be put in parallel, but in my context I can use several 'N', ie. I can start P threads each computing a sequence of N.

The goal is to minimize the number of N and I would like to know how many computation I can put in parallel ?

ie. what is the number of threads (work-item) I can launch together before reusing the same set of N ?

simple rule is as many workitems as you can. for older card absolutly minimum of wotkitems is number of compute units*64. for new GCN architecture as 7xxx it is number of compute units*64*4