Do all shader cores of a GPU handle numerical computations in exactly the same way? I've hit upon a strange phenomenon where I execute the same kernel several times on the same deterministic input, but the output varies by a slight fraction. This happens only on my GPU (HD5450), not when I executing on any of my (Intel, multicore) CPUs.
I've been thinking for a while that it should be some concurrency issue, but the fact that it does not seem to occur on CPU at all (even multicore) and the fact that after staring at this tiny kernel for many days I still don't see it, makes me wonder if there is no different explaining after all.
Asimple NetBeans project which triggers the instability on GPU can be found in this link: http://dl.dropbox.com/u/3060536/WeirdOpenCLbug.rar (the kernel is in ./src/bug.cl).