For the last week at least, I've been trying to figure out what's going wrong in my program and I'm close to giving up,....
Here's my story:
I have written an OpenCL kernel which is can calculate the Voigt function. That task comes down to numerically performing a convolution of a Gauss and a Lorentz function. The algorithm depends heavily on "Numerical Receipes" and that's why I cannot post it here. Anyway, I suppose there's nothing wrong with it so far.
As C# is the programming language I'm most familiar with, I used CLOO (OpenCL C# bindings) to run the kernel. Calculating 5000 points of the Voigt function gives a speedup of about a factor 50 when using a HD5850 instead of Core i5 750. So far so good.
Then for some reasons I had to switch to C++. Running the same Kernel things have changed somewhat. CPU and GPU are about equivalently slow. Meaning Code running an the CPU has almost the same speed as with CLOO and code running on the GPU is always slighly slower (about 10% - 20%). That I'm actually really using the GPU is easily confirmed by looking at the processor usage.
I really cannot imagine what's causing this behaviour. First naive idea was that's the fault of the C++ bindings. But it is not. Doing the same thing in plane C results in the same behaviour,.....
I really don't do any fancy stuff. Initialization and running is just in the way you see it in the tutorials. Though most of them use CPU device. My idea is, that there's more to do then just to change CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_GPU, is there? But the funny thing is, the code runs and produces right results, it's just far to slow,...
Does anyone have an idea what's going wrong????
-> I just realize, that I cannot access mysource code right now, as it's still on my office pc. I'll post it as soon as I get hold of it again <-