I have a kernel that runs just fine on the GPU, but when I run it on the CPU all the results become NaN values. I am not using any extension and I have tried using the AMD as well as the Intel SDK for running it on the CPU. Also I don't get any error message. Other kernels work fine on both, GPU and CPU.
Is there any OpenCL 1.1 command that could behave different on CPU and GPU?
I first suspected the native_xxx commands, but after changing those to the non native versions it still gave NaN values. Should the native_xxx commands work on CPUs in general?
I have tried three different CPUs (some Intel Core 2 Duo and two different Core i7).
Although I am not sure, but native instructions are also opencl standard and therefore must be mapped to something on CPUs. Generally CPU and GPU can give different results when there are synchronization issues with the kernel. I guess it is better if you can share your kernel.
Thanks. Unfortunately I cannot share the kernel. I think I ruled out synchronization issues by running it with a global work size of 1, which still gave the correct result on GPU and NaN on CPU. I will try it again, just to make sure.
Any other ideas?
I used a native_powr(x,y) where x < 0. Apparently on three different AMD GPU types I have in use that worked fine, on CPU it gives NaN.
Specs say that for native_powr x>=0. Using pow(x,y) (actually I am using pown(x,y) in my case) works.