Hi,
I have three nested loops .
Now the question is to have the nested loop in the kernel so that each thread has the nested loop or to have the nested loop on host code and launching the kernel multiple time
Which would yield better performance?
FYI : its 3 nested FOR loops and i have an option of putting them in the kernel or host code.