I've just downloaded the new driver and noticed, that my program is became 3.7% slower than before. I have figured out that the problem is that the new driver wont allow to execute 2 kernels overlapped.
Here is what worked well before cat 12.10:
at 90% of the exec time of kernel1 -> start kernel2
at 90% of the exec time of kernel2 -> start kernel3
and so on.
(kernel time is around 0.4 seconds)
And the way it was worked on opencl: -> make 2 contexts and run two of the overlapped kernels on each different contexts. Check the completion in a 20msec timer and launch new kernel when needed.
Now this is not working with Catalyst 12.10. It's 3.7% slower now because CUes are sleeping between kernels :S. You know, it's like turning off one of the 32 CUes in a HD7970, just because those bad kernel-to-kernel transitions.
So If anyone knows, please tell what is the proper way to keep the CUes filled with work ALL THE TIME?
Thank you for answers.