Hello,
I'm using clAmdFft Version clAmdFft-1.6.244 (AMD APP SDK 2.6) on an Intel Xeon W3550 (running 64bit linux) using a Radeon HD 7970.
When performing large 1D single precision complex FFTs with a length of 2^20 or 2^21 the results appear to be incorrect.
Doing a forward then backward transform of a vector 2^20 or 2^21 elements long gives a RMSE of ~0.328 whereas a forward/backward transform of a 2^19 vector gives a RMSE of ~2.1717e-07. Strangely, going larger than 2^21 (i.e. 2^22) also appears to work correctly. Also, making the length a non pure power of 2 also appears to work (i.e. 2^19*3 returns the expected values).
Is this a known issue?
Any help or suggestions would be greatly appreciated.
Thanks.
Hi myrv,
Thanks for posting. I assume you are using the OpenCL GPU target (7970). What version of the graphics driver and APP SDK are you using? You can upgrade to the 1.8 version of the FFT libraries and see if it works for you. But please understand that the libraries still are 'beta'-only on the 7000 series cards. There are still some known issues in our software layers that prevents a full release.
We'll perform testing of those specific sizes locally and give an update if there are any more details.
Yes, I'm using the 7970 as my OpenCL target. It is a stock PowerColor 3GB card running at stock speeds.
I am using the AMD-APP SDK 2.7
The video drivers are reported as:
AMD Radeon HD 7900 Series:
BIOS Version: 015.012.000.004.000346
Catalyst Version: 12.4
Driver Packaging Version: 8.961-120405a-137531C-ATI
2D Driver Version: 8.96.4
OpenGL Version: 4.2.11631 Compatibility Profile Context
My machine is a RHEL 5.5 machine:
uname -a
Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
The X-server is as follows:
X Window System Version 7.1.1
Release Date: 12 May 2006
X Protocol Version 11, Revision 0, Release 7.1.1
Build Operating System: Linux 2.6.18-164.11.1.el5 x86_64 Red Hat, Inc.
Build ID: xorg-x11-server 1.1.1-48.76.el5
Further testing with a simplified version of my code still shows problems with clAmdFft 1.6 at vector sizes 2^20 and 2^21. It also showed issues with 2^16 but this was far more random. ~80% of the time 2^16 would work. The other 20% it would return either an incorrect answers or lock up my machine. I am however new to all this so I may have screwed something up. I can send my code but I'm sure the last thing you want to do is debug someone else's code.
The following created 5 random interleaved complex (float) vectors then performed a forward and backward transform on each and compared the result to the original (I also tried using doubles and the results where essentially the same):
clFFT client API version: 1.6.244
clFFT runtime version: 1.6.244
Size (2^19)= 524288
mem_size: 4194304
RMSE: 2.17056e-07
RMSE: 2.12428e-07
RMSE: 2.13238e-07
RMSE: 2.07946e-07
RMSE: 2.22984e-07
Size (2^20)= 1048576
mem_size: 8388608
RMSE: 0.247278
RMSE: 0.242976
RMSE: 0.254346
RMSE: 0.252244
RMSE: 0.246728
Size (2^21)= 2097152
mem_size: 16777216
RMSE: 0.245888
RMSE: 0.248527
RMSE: 0.257636
RMSE: 0.253573
RMSE: 0.251165
Size (2^22)= 4194304
mem_size: 33554432
RMSE: 2.43334e-07
RMSE: 2.26586e-07
RMSE: 2.33932e-07
RMSE: 2.28084e-07
RMSE: 2.28967e-07
Switching to clAmdFft-1.8.269 seems to fix the issue. Exactly the same code, just compiled against the new library:
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^19)= 524288
mem_size: 4194304
RMSE: 2.17056e-07
RMSE: 2.12428e-07
RMSE: 2.13238e-07
RMSE: 2.07946e-07
RMSE: 2.22984e-07
Size (2^20)= 1048576
mem_size: 8388608
RMSE: 2.1757e-07
RMSE: 2.15285e-07
RMSE: 2.28142e-07
RMSE: 2.40964e-07
RMSE: 2.28345e-07
The only odd thing I found with 1.8.269 is when performing a double precision transform of vectors of length 2^16 they report the exact same RMSE for any random input which is puzzling to say the least.
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^16)= 65536
mem_size: 1048576
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
Anyway, thanks for the response. I'll keep looking at my code but I can't see anything I've obviously done wrong.
Hi myrv,
Glad to see that 1.8 is working for you. The rmse values for length 2^16 looks puzzling indeed. Let us do some experiments locally and see what is going on. Please post your code if possible and we will help wherever we can.
Here is the source for my double precision version. It makes use of the clAmdFft.openCl.cpp from the clAmdFft library sample directory (I've included them in the zip but you should be able to use the ones from the library itself). Actually a lot of the code itself was lifted from fft samples.
It was built with g++44 (GCC) 4.4.0 20090514 (ya, old, but it's what the company provides) using the following:
g++44 -I/opt/clAmdFft-1.8.269/include -I/opt/AMDAPP/include dsimpleFFT.cpp clAmdFft.openCL.cpp -o dsimplefft -L/opt/clAmdFft-1.8.269/lib64 -lclAmdFft.Runtime -lm
You should be able to run it by providing a power of two and number of iterations as command line parameters:
$ ./dsimplefft 16 5
clFFT client API version: 1.8.269
clFFT runtime version: 1.8.269
Size (2^16)= 65536
mem_size: 1048576
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
RMSE: 3.2768e-11
It will be interesting to see if you get a different result.
Hi,
I ran your code and I was able to reproduce the same rmse between multiple runs for 2^16 FFT sizes. I examined the inputs and outputs and everything seem fine. The thing is for most input vectors, at 2^16 size the library seems to calculate very accurate FFTs for some reason and hence the least rmse error repeats. Also, random inputs are not good real world cases and they are not good at spotting real issues. I have verified the library output at 2^16 size with some meaningful input vectors against popular libraries such as ACML and FFTW and I am convinced things are working correctly. If you run into further issues, please contact us.