When I run my program in CodeXL with the exact same code and options, I get different results then when I run it on the console without CodeXL. My question is, whether this is at all possible or if I should start to be looking for a bug now. It's not that the results are completely wrong, they are just a little bit off.
The OpenCL code heavily uses floating point arithmetic and trigonometric functions and I've experienced before that those functions tend to produce different results for different devices or even the same device and different implementations, especially using float values. Is this the case here or am I missing something?
In short, the answer is yes - CodeXL makes the kernel be built differently.
If the program shows different results with different hardware or implementations especially if floating point arithmetic is being used heavily, like you said, it's probably due to conventions of rounding and such which can change.
When debugging the application in CodeXL, the CodeXL OpenCL server modifies the build flags in preparation of the possibility of kernel debugging. One of the changes made is adding flags that disable optimizations, to allow for a natural-feeling debugging experience (when performing optimizations, the compiler might, for example, re-order code or remove variables that are not needed).
If you see different results with different devices, it's completely possible that you'll see different results with CodeXL vs. without it.
You might want to pass "-cl-opt-disable" to clBuildProgram's flags parameter, and see if the results with and without CodeXL are the same in that case.
thank you for the insight. Unfortunately, the results also differ with the "-cl-opt-disable" flag. Are there any other ways in which I can try to make them equal?
My fear right now is, that the Profiler changes some timings so that buffers are copied a little later and with the actual correct results. Both results seem correct but I would like to be sure and comparing them, considering the OpelCL typical amount of data (10,000 items and more), is pretty cumbersome.