It was hard to find a good single line title for my question, but the longer form is this.
I have an application which first does some heavy calculation on the cpu. Then it calls an opencl kernel. The two calculation jobs have nothing to do with eachother. They just both have do be done. The cpu portion is slow and the gpu is fast.
This was initially developed using a Nvidia GTX 580 and, naturally it didn't affect the running time for the kernel whether the cpu had been calculating before I called the kernel or if it had been idle. The kernel took the same time regardless.
Now I started testing with an AMD Radeon 7970 hd and to my surprise it was running its kernel a bit slower than what I saw with the gtx 580 card.
After some investigation it turns out that the AMD card is faster than the Nvidia if the cpu had not done any computation before launcing the kernel, but if the cpu had been working before kernel launch, then AMD was slower.
the AMD kernel run took 9ms if the cpu had been working and 3 if the cpu had been sitting idle.
the nvidia kernel run took 8ms if the cpu had been working and 8ms if the cpu had been sitting idle.
I validated these measurements using CodeXL and it shows the same timing as I measured inside my program.
I tested various "kinds" of cpu work and even if I just keep doing the same calculation on a single variable, this slowdown happens, so it doesnt seem to be related to the amount of bytes in large buffers being moved around.
windows 8, Intel core2 duo
C#.net using cloo
Latest drivers for amd and nvidia
I must say I am really confused as to how this makes any sense...?