Archives Discussions

myrv · ‎06-25-2012

Hello,

I'm using clAmdFft Version clAmdFft-1.6.244 (AMD APP SDK 2.6) on an Intel Xeon W3550 (running 64bit linux) using a Radeon HD 7970.

When performing large 1D single precision complex FFTs with a length of 2^20 or 2^21 the results appear to be incorrect.

Doing a forward then backward transform of a vector 2^20 or 2^21 elements long gives a RMSE of ~0.328 whereas a forward/backward transform of a 2^19 vector gives a RMSE of ~2.1717e-07. Strangely, going larger than 2^21 (i.e. 2^22) also appears to work correctly. Also, making the length a non pure power of 2 also appears to work (i.e. 2^19*3 returns the expected values).

Is this a known issue?

Any help or suggestions would be greatly appreciated.

Thanks.

bragadeesh · ‎06-26-2012

Hi myrv,

Thanks for posting. I assume you are using the OpenCL GPU target (7970). What version of the graphics driver and APP SDK are you using? You can upgrade to the 1.8 version of the FFT libraries and see if it works for you. But please understand that the libraries still are 'beta'-only on the 7000 series cards. There are still some known issues in our software layers that prevents a full release.

We'll perform testing of those specific sizes locally and give an update if there are any more details.

myrv · ‎06-27-2012

Yes, I'm using the 7970 as my OpenCL target. It is a stock PowerColor 3GB card running at stock speeds.

I am using the AMD-APP SDK 2.7

The video drivers are reported as:

AMD Radeon HD 7900 Series:

BIOS Version: 015.012.000.004.000346

Catalyst Version: 12.4

Driver Packaging Version: 8.961-120405a-137531C-ATI

2D Driver Version: 8.96.4

OpenGL Version: 4.2.11631 Compatibility Profile Context

My machine is a RHEL 5.5 machine:

uname -a

Linux 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

The X-server is as follows:

X Window System Version 7.1.1

Release Date: 12 May 2006

X Protocol Version 11, Revision 0, Release 7.1.1

Build Operating System: Linux 2.6.18-164.11.1.el5 x86_64 Red Hat, Inc.

Build ID: xorg-x11-server 1.1.1-48.76.el5

Further testing with a simplified version of my code still shows problems with clAmdFft 1.6 at vector sizes 2^20 and 2^21. It also showed issues with 2^16 but this was far more random. ~80% of the time 2^16 would work. The other 20% it would return either an incorrect answers or lock up my machine. I am however new to all this so I may have screwed something up. I can send my code but I'm sure the last thing you want to do is debug someone else's code.

The following created 5 random interleaved complex (float) vectors then performed a forward and backward transform on each and compared the result to the original (I also tried using doubles and the results where essentially the same):

clFFT client API version: 1.6.244

clFFT runtime version: 1.6.244

Size (2^19)= 524288

mem_size: 4194304

RMSE: 2.17056e-07

RMSE: 2.12428e-07

RMSE: 2.13238e-07

RMSE: 2.07946e-07

RMSE: 2.22984e-07

Size (2^20)= 1048576

mem_size: 8388608

RMSE: 0.247278

RMSE: 0.242976

RMSE: 0.254346

RMSE: 0.252244

RMSE: 0.246728

Size (2^21)= 2097152

mem_size: 16777216

RMSE: 0.245888

RMSE: 0.248527

RMSE: 0.257636

RMSE: 0.253573

RMSE: 0.251165

Size (2^22)= 4194304

mem_size: 33554432

RMSE: 2.43334e-07

RMSE: 2.26586e-07

RMSE: 2.33932e-07

RMSE: 2.28084e-07

RMSE: 2.28967e-07

Switching to clAmdFft-1.8.269 seems to fix the issue. Exactly the same code, just compiled against the new library:

clFFT client API version: 1.8.269

clFFT runtime version: 1.8.269

Size (2^19)= 524288

mem_size: 4194304

RMSE: 2.17056e-07

RMSE: 2.12428e-07

RMSE: 2.13238e-07

RMSE: 2.07946e-07

RMSE: 2.22984e-07

Size (2^20)= 1048576

mem_size: 8388608

RMSE: 2.1757e-07

RMSE: 2.15285e-07

RMSE: 2.28142e-07

RMSE: 2.40964e-07

RMSE: 2.28345e-07

The only odd thing I found with 1.8.269 is when performing a double precision transform of vectors of length 2^16 they report the exact same RMSE for any random input which is puzzling to say the least.

clFFT client API version: 1.8.269

clFFT runtime version: 1.8.269

Size (2^16)= 65536

mem_size: 1048576

RMSE: 3.2768e-11

Anyway, thanks for the response. I'll keep looking at my code but I can't see anything I've obviously done wrong.

bragadeesh · ‎06-27-2012

Hi myrv,

Glad to see that 1.8 is working for you. The rmse values for length 2^16 looks puzzling indeed. Let us do some experiments locally and see what is going on. Please post your code if possible and we will help wherever we can.

myrv · ‎06-27-2012

Here is the source for my double precision version. It makes use of the clAmdFft.openCl.cpp from the clAmdFft library sample directory (I've included them in the zip but you should be able to use the ones from the library itself). Actually a lot of the code itself was lifted from fft samples.

It was built with g++44 (GCC) 4.4.0 20090514 (ya, old, but it's what the company provides) using the following:

g++44 -I/opt/clAmdFft-1.8.269/include -I/opt/AMDAPP/include dsimpleFFT.cpp clAmdFft.openCL.cpp -o dsimplefft -L/opt/clAmdFft-1.8.269/lib64 -lclAmdFft.Runtime -lm

You should be able to run it by providing a power of two and number of iterations as command line parameters:

$ ./dsimplefft 16 5

clFFT client API version: 1.8.269

clFFT runtime version: 1.8.269

Size (2^16)= 65536

mem_size: 1048576

RMSE: 3.2768e-11

It will be interesting to see if you get a different result.

bragadeesh · ‎07-03-2012

Hi,

I ran your code and I was able to reproduce the same rmse between multiple runs for 2^16 FFT sizes. I examined the inputs and outputs and everything seem fine. The thing is for most input vectors, at 2^16 size the library seems to calculate very accurate FFTs for some reason and hence the least rmse error repeats. Also, random inputs are not good real world cases and they are not good at spotting real issues. I have verified the library output at 2^16 size with some meaningful input vectors against popular libraries such as ACML and FFTW and I am convinced things are working correctly. If you run into further issues, please contact us.

Archives Discussions

clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors