
Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
bragadeesh Jun 26, 2012 4:09 PM (in response to myrv)Hi myrv,
Thanks for posting. I assume you are using the OpenCL GPU target (7970). What version of the graphics driver and APP SDK are you using? You can upgrade to the 1.8 version of the FFT libraries and see if it works for you. But please understand that the libraries still are 'beta'only on the 7000 series cards. There are still some known issues in our software layers that prevents a full release.
We'll perform testing of those specific sizes locally and give an update if there are any more details.

myrv Jun 27, 2012 3:15 PM (in response to bragadeesh)Yes, I'm using the 7970 as my OpenCL target. It is a stock PowerColor 3GB card running at stock speeds.
I am using the AMDAPP SDK 2.7
The video drivers are reported as:
AMD Radeon HD 7900 Series:
BIOS Version: 015.012.000.004.000346
Catalyst Version: 12.4
Driver Packaging Version: 8.961120405a137531CATI
2D Driver Version: 8.96.4
OpenGL Version: 4.2.11631 Compatibility Profile Context
My machine is a RHEL 5.5 machine:
uname a
Linux 2.6.18194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
The Xserver is as follows:
X Window System Version 7.1.1
Release Date: 12 May 2006
X Protocol Version 11, Revision 0, Release 7.1.1
Build Operating System: Linux 2.6.18164.11.1.el5 x86_64 Red Hat, Inc.
Build ID: xorgx11server 1.1.148.76.el5
Further testing with a simplified version of my code still shows problems with clAmdFft 1.6 at vector sizes 2^20 and 2^21. It also showed issues with 2^16 but this was far more random. ~80% of the time 2^16 would work. The other 20% it would return either an incorrect answers or lock up my machine. I am however new to all this so I may have screwed something up. I can send my code but I'm sure the last thing you want to do is debug someone else's code.
The following created 5 random interleaved complex (float) vectors then performed a forward and backward transform on each and compared the result to the original (I also tried using doubles and the results where essentially the same):
clFFT client API version: 1.6.244
clFFT runtime version: 1.6.244
Size (2^19)= 524288
mem_size: 4194304
RMSE: 2.17056e07
RMSE: 2.12428e07
RMSE: 2.13238e07
RMSE: 2.07946e07
RMSE: 2.22984e07
Size (2^20)= 1048576
mem_size: 8388608
RMSE: 0.247278
RMSE: 0.242976
RMSE: 0.254346
RMSE: 0.252244
RMSE: 0.246728
Size (2^21)= 2097152
mem_size: 16777216
RMSE: 0.245888
RMSE: 0.248527
RMSE: 0.257636
RMSE: 0.253573
RMSE: 0.251165
Size (2^22)= 4194304
mem_size: 33554432
RMSE: 2.43334e07
RMSE: 2.26586e07
RMSE: 2.33932e07
RMSE: 2.28084e07
RMSE: 2.28967e07
Switching to clAmdFft1.8.269 seems to fix the issue. Exactly the same code, just compiled against the new library:
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^19)= 524288
mem_size: 4194304
RMSE: 2.17056e07
RMSE: 2.12428e07
RMSE: 2.13238e07
RMSE: 2.07946e07
RMSE: 2.22984e07
Size (2^20)= 1048576
mem_size: 8388608
RMSE: 2.1757e07
RMSE: 2.15285e07
RMSE: 2.28142e07
RMSE: 2.40964e07
RMSE: 2.28345e07
The only odd thing I found with 1.8.269 is when performing a double precision transform of vectors of length 2^16 they report the exact same RMSE for any random input which is puzzling to say the least.
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^16)= 65536
mem_size: 1048576
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
Anyway, thanks for the response. I'll keep looking at my code but I can't see anything I've obviously done wrong.

bragadeesh Jun 27, 2012 3:39 PM (in response to myrv)Hi myrv,
Glad to see that 1.8 is working for you. The rmse values for length 2^16 looks puzzling indeed. Let us do some experiments locally and see what is going on. Please post your code if possible and we will help wherever we can.

myrv Jun 27, 2012 4:56 PM (in response to bragadeesh)Here is the source for my double precision version. It makes use of the clAmdFft.openCl.cpp from the clAmdFft library sample directory (I've included them in the zip but you should be able to use the ones from the library itself). Actually a lot of the code itself was lifted from fft samples.
It was built with g++44 (GCC) 4.4.0 20090514 (ya, old, but it's what the company provides) using the following:
g++44 I/opt/clAmdFft1.8.269/include I/opt/AMDAPP/include dsimpleFFT.cpp clAmdFft.openCL.cpp o dsimplefft L/opt/clAmdFft1.8.269/lib64 lclAmdFft.Runtime lm
You should be able to run it by providing a power of two and number of iterations as command line parameters:
$ ./dsimplefft 16 5
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^16)= 65536
mem_size: 1048576
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
RMSE: 3.2768e11
It will be interesting to see if you get a different result.

dsimpleFFT.cpp.zip 2.0 KB

clAmdFft_shared.tgz 5.8 KB

bragadeesh Jul 3, 2012 2:14 PM (in response to myrv)Hi,
I ran your code and I was able to reproduce the same rmse between multiple runs for 2^16 FFT sizes. I examined the inputs and outputs and everything seem fine. The thing is for most input vectors, at 2^16 size the library seems to calculate very accurate FFTs for some reason and hence the least rmse error repeats. Also, random inputs are not good real world cases and they are not good at spotting real issues. I have verified the library output at 2^16 size with some meaningful input vectors against popular libraries such as ACML and FFTW and I am convinced things are working correctly. If you run into further issues, please contact us.



