5 Replies Latest reply on Jul 3, 2012 2:14 PM by bragadeesh

    clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors

    myrv

      Hello,

       

      I'm using clAmdFft Version clAmdFft-1.6.244 (AMD APP SDK 2.6) on an Intel Xeon W3550 (running 64bit linux) using a Radeon HD 7970. 

       

      When performing large 1D single precision complex FFTs  with a length of 2^20 or 2^21 the results appear to be incorrect.

       

      Doing a forward then backward transform of a vector 2^20 or 2^21 elements long gives a RMSE of ~0.328 whereas a forward/backward transform of a 2^19 vector gives a RMSE of ~2.1717e-07.  Strangely, going larger than 2^21 (i.e. 2^22) also appears to work correctly.  Also, making the length a non pure power of 2 also appears to work (i.e. 2^19*3 returns the expected values). 

       

      Is this a known issue?

       

      Any help or suggestions would be greatly appreciated.

      Thanks.

        • Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
          bragadeesh

          Hi myrv,

           

          Thanks for posting. I assume you are using the OpenCL GPU target (7970). What version of the graphics driver and APP SDK are you using? You can upgrade to the 1.8 version of the FFT libraries and see if it works for you. But please understand that the libraries still are 'beta'-only on the 7000 series cards. There are still some known issues in our software layers that prevents a full release.

           

          We'll perform testing of those specific sizes locally and give an update if there are any more details.

            • Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
              myrv

              Yes, I'm using the 7970 as my OpenCL target. It is a stock PowerColor 3GB card running at stock speeds.

               

              I am using the AMD-APP SDK 2.7

               

              The video drivers are reported as:

               

              AMD Radeon HD 7900 Series:

              BIOS Version: 015.012.000.004.000346

              Catalyst Version: 12.4

              Driver Packaging Version: 8.961-120405a-137531C-ATI

              2D Driver Version: 8.96.4

              OpenGL Version: 4.2.11631 Compatibility Profile Context

               

              My machine is a RHEL 5.5 machine:

              uname -a

              Linux  2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

               

              The X-server is as follows:

               

              X Window System Version 7.1.1

              Release Date: 12 May 2006

              X Protocol Version 11, Revision 0, Release 7.1.1

              Build Operating System: Linux 2.6.18-164.11.1.el5 x86_64 Red Hat, Inc.

              Build ID: xorg-x11-server 1.1.1-48.76.el5

               

               

              Further testing with a simplified version of my code still shows problems with clAmdFft 1.6 at vector sizes 2^20 and 2^21.   It also showed issues with 2^16 but this was far more random.  ~80% of the time 2^16 would work. The other 20% it would return either an incorrect answers or lock up my machine.  I am however new to all this so I may have screwed something up. I can send my code but I'm sure the last thing you want to do is debug someone else's code.

               

              The following created  5 random interleaved complex (float) vectors then performed a forward and backward transform on each and compared the result to the original (I also tried using doubles and the results where essentially the same):

               

              clFFT client API version: 1.6.244

              clFFT runtime version:    1.6.244

               

              Size (2^19)= 524288

              mem_size: 4194304

              RMSE: 2.17056e-07

              RMSE: 2.12428e-07

              RMSE: 2.13238e-07

              RMSE: 2.07946e-07

              RMSE: 2.22984e-07

               

              Size (2^20)= 1048576

              mem_size: 8388608

              RMSE: 0.247278

              RMSE: 0.242976

              RMSE: 0.254346

              RMSE: 0.252244

              RMSE: 0.246728

               

              Size (2^21)= 2097152

              mem_size: 16777216

              RMSE: 0.245888

              RMSE: 0.248527

              RMSE: 0.257636

              RMSE: 0.253573

              RMSE: 0.251165

               

              Size (2^22)= 4194304

              mem_size: 33554432

              RMSE: 2.43334e-07

              RMSE: 2.26586e-07

              RMSE: 2.33932e-07

              RMSE: 2.28084e-07

              RMSE: 2.28967e-07

               

              Switching to clAmdFft-1.8.269 seems to fix the issue. Exactly the same code, just compiled against the new library:

               

              clFFT client API version: 1.8.269

              clFFT runtime version:    1.8.269

               

              Size (2^19)= 524288

              mem_size: 4194304

              RMSE: 2.17056e-07

              RMSE: 2.12428e-07

              RMSE: 2.13238e-07

              RMSE: 2.07946e-07

              RMSE: 2.22984e-07

               

              Size (2^20)= 1048576

              mem_size: 8388608

              RMSE: 2.1757e-07

              RMSE: 2.15285e-07

              RMSE: 2.28142e-07

              RMSE: 2.40964e-07

              RMSE: 2.28345e-07

               

              The only odd thing I found with 1.8.269 is when performing a double precision transform of vectors of length 2^16 they report the exact same RMSE for any random input which is puzzling to say the least.

               

              clFFT client API version: 1.8.269

              clFFT runtime version:    1.8.269

               

              Size (2^16)= 65536

              mem_size: 1048576

              RMSE: 3.2768e-11

              RMSE: 3.2768e-11

              RMSE: 3.2768e-11

              RMSE: 3.2768e-11

              RMSE: 3.2768e-11

               

              Anyway, thanks for the response.  I'll keep looking at my code but I can't see anything I've obviously done wrong.

                • Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
                  bragadeesh

                  Hi myrv,

                   

                  Glad to see that 1.8 is working for you. The rmse values for length 2^16 looks puzzling indeed. Let us do some experiments locally and see what is going on. Please post your code if possible and we will help wherever we can.

                    • Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
                      myrv

                      Here is the source for my double precision version. It makes use of the clAmdFft.openCl.cpp from the clAmdFft library sample directory (I've included them in the zip but you should be able to use the ones from the library itself).  Actually a lot of the code itself was lifted from fft samples. 

                       

                      It was built with  g++44 (GCC) 4.4.0 20090514 (ya, old, but it's what the company provides) using the following:

                       

                      g++44 -I/opt/clAmdFft-1.8.269/include -I/opt/AMDAPP/include dsimpleFFT.cpp clAmdFft.openCL.cpp  -o dsimplefft -L/opt/clAmdFft-1.8.269/lib64 -lclAmdFft.Runtime -lm

                       

                      You should be able to run it by providing a power of two and number of iterations as command line parameters:

                       

                      $ ./dsimplefft   16   5

                      clFFT client API version: 1.8.269

                      clFFT runtime version:    1.8.269

                       

                      Size (2^16)= 65536

                      mem_size: 1048576

                      RMSE: 3.2768e-11

                      RMSE: 3.2768e-11

                      RMSE: 3.2768e-11

                      RMSE: 3.2768e-11

                      RMSE: 3.2768e-11

                       

                      It will be interesting to see if you get a different result.

                        • Re: clAmdFft appears to give incorrect results for 2^20 and 2^21 length vectors
                          bragadeesh

                          Hi,

                           

                          I ran your code and I was able to reproduce the same rmse between multiple runs for 2^16 FFT sizes. I examined the inputs and outputs and everything seem fine. The thing is for most input vectors, at 2^16 size the library seems to calculate very accurate FFTs for some reason and hence the least rmse error repeats. Also, random inputs are not good real world cases and they are not good at spotting real issues. I have verified the library output at 2^16 size with some meaningful input vectors against popular libraries such as ACML and FFTW and I am convinced things are working correctly. If you run into further issues, please contact us.