Recommended way to overlap Ocl kernels on HD7970 Catalyst 12.10

Question asked by realhet on Oct 25, 2012
Latest reply on Nov 19, 2012 by nou



I've just downloaded the new driver and noticed, that my program is became 3.7% slower than before. I have figured out that the problem is that the new driver wont allow to execute 2 kernels overlapped.


Here is what worked well before cat 12.10:

start kernel1

at 90% of the exec time of kernel1 -> start kernel2

at 90% of the exec time of kernel2 -> start kernel3

and so on.

(kernel time is around 0.4 seconds)


And the way it was worked on opencl: -> make 2 contexts and run two of the overlapped kernels on each different contexts. Check the completion in a 20msec timer and launch new kernel when needed.


Now this is not working with Catalyst 12.10. It's 3.7% slower now because CUes are sleeping between kernels :S. You know, it's like turning off one of the 32 CUes in a HD7970, just because those bad kernel-to-kernel transitions.

So If anyone knows, please tell what is the proper way to keep the CUes filled with work ALL THE TIME?


Thank you for answers.