I am currently working on AMD OpenCL SDK v2.2. I have written a 2D FFT on OpenCL. The code was giving correct results on both CPU and GPU when the work group size was 64. I increased the work group size to 256. The modified code gives correct results while running on CPU but gives incorrect results when run on GPU. Also when I added a printf statement in the Kernel, the code gives correct results on GPU. If the code works fine on CPU, is it not expected that the code would work fine on GPU also?