I tried clAmdFft on different platforms/devices, and get correct results on AMD GPUs but incorrect results on NVIDIA GPUs. Also, the results on CPUs are correct only when using Intel's OpenCL platform, but incorrect when using AMD's OpenCL platform. Am I doing something wrong, or is this a bug in the clAmdFft library (1.8.291) ? See attached source code --- the correct answer for this 4-point fft should be (16,20) (-8,0) (-4,-4) (0,-8)
Program output for various platform/device combinations:
platform: AMD Accelerated Parallel Processing, device: Tahiti: (16,20) (-8,0) (-4,-4) (0,-8)
platform: AMD Accelerated Parallel Processing, device: Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz: (2,4) (6,16) (10,12) (14,8)
platform: Intel(R) OpenCL, device: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz: (16,20) (-8,0) (-4,-4) (0,-8)
platform: NVIDIA CUDA, device: Tesla K10.G2.8GB: (6,8) (-4,-4) (6,8) (-4,-4)
platform: NVIDIA CUDA, device: Tesla K10.G2.8GB: (6,8) (-4,-4) (6,8) (-4,-4)
platform: NVIDIA CUDA, device: GeForce GTX 680: (6,8) (-4,-4) (6,8) (-4,-4)
"incorrect when using AMD's OpenCL platform":
more detail on the software?
This is a small program that just performs a simple 4-points FFT. It illustrates that the clAmdFft library returns wrong results on Nvidia devices (as well as on CPUs when using the AMD OpenCL runtime). As I do not have the sources of the library, I cannot see what goes wrong.
Hi romein,
Thanks for the test case. This is a known issue and the problem is not in the library code. It is in AMD's OpenCL runtime and driver stack that affects only the CPU device. You may have guessed that the problem is not in the library code as it works on GPU with our OpenCL platform and also on CPU with Intel's OpenCL platform.
The problem has been fixed. But unfortunately it will be made available only in the next version of the Catalyst driver release.