I tried clAmdFft on different platforms/devices, and get correct results on AMD GPUs but incorrect results on NVIDIA GPUs. Also, the results on CPUs are correct only when using Intel's OpenCL platform, but incorrect when using AMD's OpenCL platform. Am I doing something wrong, or is this a bug in the clAmdFft library (1.8.291) ? See attached source code --- the correct answer for this 4-point fft should be (16,20) (-8,0) (-4,-4) (0,-8)
Program output for various platform/device combinations:
platform: AMD Accelerated Parallel Processing, device: Tahiti: (16,20) (-8,0) (-4,-4) (0,-8)
platform: AMD Accelerated Parallel Processing, device: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz: (2,4) (6,16) (10,12) (14,8)
platform: Intel(R) OpenCL, device: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz: (16,20) (-8,0) (-4,-4) (0,-8)
platform: NVIDIA CUDA, device: Tesla K10.G2.8GB: (6,8) (-4,-4) (6,8) (-4,-4)
platform: NVIDIA CUDA, device: Tesla K10.G2.8GB: (6,8) (-4,-4) (6,8) (-4,-4)
platform: NVIDIA CUDA, device: GeForce GTX 680: (6,8) (-4,-4) (6,8) (-4,-4)