5 Replies Latest reply on May 14, 2009 2:14 PM by jim.conyngham@amd.com

    Fast Fourier Transform in ACML 4.2.0

      How to improve accuracy?


      I've got s small problem with ZFFT2DX in ACML 4.2.0.

      I wrote a sample program - all it does is it takes 2D array of real data, then it takes forward Fourier Transform, multiplies it by 1/scale and then takes backward Fourier Transform. In ideal case I would get original array. But, in real world I get tiny imaginary parts, which are close to 0. Is there any way to get rid of this tiny non-zero elements in the final result?

      The reason I'm asking is that in my basic program I combine two real quantaties into complex number and perform the procedure described above (in order to take one FFT instead of two). And currently it means that one quantaty influences another.

      I use ACML 4.2.0 in conjunction with ifort (lates version) OR gfortran. Same results for both compilers.

      Compiling lines:

      gfortran -m64 -static acml_fft_check.f -L/opt/acml4.2.0/gfortran64/lib -lacml

      ifort -O3 -static acml_fft_check.f -L/opt/acml4.2.0/ifort64/lib/ -lacml


      The code of sample program:

              program compare_ffts
              parameter (N=5)
              implicit real*8 (a-h,o-z)
              complex*16 COMM(N*N+6*N+200), m1(N,N)
              complex*16 CMPT2(N,N), diff(N,N), CHECK(N,N)
              real*8 cor1, sc, scale
              logical ltrans, inpl

              call zfft2dx(100,sc,ltrans,inpl,N,N,CMPT1,1,N,m1,1,N,COMM,INFO)

              do j=1,N
                 do i=1,N
                 end do
              end do

              CALL zfft2dx(-1,sc,ltrans,inpl,N,N,CMPT2,1,N,m1,1,N,COMM,INFO)


              do j=1,N
                 do i=1,N
                 end do
              end do

              CALL zfft2dx(1,sc,ltrans,inpl,N,N,CMPT2,1,N,m1,1,N,COMM,INFO)

              write(*,*) 'CMPT_ACML AFTER'
              write(*,*) 'mean_real= ', sum(dble(diff))/scale
              write(*,*) 'mean_imag= ', sum(dimag(diff))/scale


      I should also mention that there is no problem if N<=4.

      Also, these non-zero imaginary values are less if N = 4, 16, 64, 256. However further increase to N=512, 1024 doesn't lead to decrease in the error.

      In my basic program I use N=512.


      GCC/GFORTRAN: version 4.3.2

      IFORT: Version 11.0

        • Fast Fourier Transform in ACML 4.2.0

          The non-zero imaginary parts in the results that you observe are just due to rounding error. When the forward transform is computed, the results you get are not exact, they have been rounded to double precision - it's just that the rounding error is hard to see because the results are non-zero. And it's not just the final results that got rounded, but many intermediate results as well.

          Then when you perform the backward FFT, the results get rounded again. This time the rounding errors stick out like a sore thumb because you know that you are expecting to get exact zeros in the solution. The magnitude of the non-zero parts - around 10^(-16) - is consistent with what you'd expect from rounding errors, because double precision floating-point arithmetic is accurate to around 16 decimal places.

          You can easily observe the effects of rounding error in much simpler computations than an FFT - try running this program, which in exact arithmetic would produce a result of zero:

                program t
                double precision a, b, c
                a = 7.0d0
                b = 11.0d0
                c = (1/a)*(1/b)*77.0d0 - 1.0d0
                write(*,*) 'c = ', c

          Compiling with gfortran or ifort, I get this result:

             c =  -1.11022302462515654E-016

            • Fast Fourier Transform in ACML 4.2.0

              Thank you for you answer, mickpont.

              I do understand that non-zero elements are due to rounding. I don't care about getting not exactly the same real part of matrix after forward and backward Fourier Transform, but I do care about getting non-zero imaginary elements.

              Please note, that if you write the same sample program using MKL Fourier Transform it will not produce non-zero imaginary elements. Somehow they (MKL-guys) got rid of it.

              And about your example: Maybe similar error was done by ACML-guys, because if you write c = 77.0d0/(a*b) - 1.0d0  then it will give you exact zero.


                • Fast Fourier Transform in ACML 4.2.0

                  Hi maxbelkin,

                  I'm surprised that MKL somehow manages to avoid the non-zero imaginary parts, though I confess I've not tried it. I have no idea, but I could imagine that the results might be post-processed to set imaginary parts of results to exact zero if they fall below a certain tolerance level.

                  About the example I gave, when the expression is written as

                  c = (1/a)*(1/b)*77.0d0 - 1.0d0

                  the compiler generates code to compute 1/a and 1/b which are not exactly representable in binary arithmetic, thus the rounding error is introduced. With your rewritten expression

                  c = 77.0d0/(a*b) - 1.0d0

                  a*b is exactly representable in binary (because it's an exact integer) and so the compiler ends up dividing 77.0 by 77.0 and consequently gets an exact 1.0 - no rounding errors involved. But my point is that an FFT algorithm is nowhere near that simple so rewriting expressions in a different order is unlikely to help.

                  Which still leaves me flummoxed about how MKL can do it!

                  Have you considered post-processing the results yourself to set tiny numbers to exact zeros?

                    • Fast Fourier Transform in ACML 4.2.0

                      Well, there are several reasons why I can not post-process the results myself. First problem is that I actually work with matrix of complex data and there is no simple division by scale between forward and backward Fourier Transforms and sometimes I can get low non-zero elements. The second problem is that in my program I use N=512 and several runs of sample program above showed that imaginary parts grove with N. So, I actually get something on the order of 1d-8 rather than 1d-16. And the last problem is that I do a LOT of Fourier transforms and manual post-processing with take a LOT of time which I can not afford.

                      But..... I rewrote my main program to use only GNU functions and compiled it with gfortran and it worked more or less like I expected, so I believe the major problem was with ifort version 11.0 not 10.1 as it is required.

                • Fast Fourier Transform in ACML 4.2.0

                  Hi Max,

                  You wrote: "The reason I'm asking is that in my basic program I combine two real quantaties into complex number and perform the procedure described above (in order to take one FFT instead of two). And currently it means that one quantity influences another."

                  That's a mathematically correct optimization, but TANSTAAFL.   I'm assuming you mean you're doing something like this, which I'm quoting from "Software Optimization for High Performance Computing", by Wadleigh and Crawford, section, "Performing Two Real FFTs with a Complex FFT":

                  • Pack xr[j] and yr[j] into a complex array z[j] by mapping xr[j] to the real locations of z[j] and yr[j] to the imaginary locations of z[j] so z[j] = complex (xr[j], yr[j])
                  • Perform an in-place, lenght n FFT on z[j]


                  • Unscramble z[j] as follows:

                      x[0] = complex (zr[0], 0.0)
                      y[0] = complex (zi[0], 0.0)
                      x[j] = (1/2) (z[j] + conjg(z[n-j)),  1 <= j <= n/2
                      y[j] = (-i/2) (z[j] - conjg[(z[n-j]),  1 <= j <= n/2

                  • Copy to create the conjugate symmetric values:

                     x[j] = conjg (x[n-j]),  (n/2 + 1) <= j <= n-1
                     y[j] = conjg (y[n-j]),  (n/2 + 1) <= j <= n-1

                  As Wadleigh and Crawford claim, this performs the equivalent of transforming two real datasets with a little more than half the computations, and it's mathematically correct, but there is a small price you pay in precision.  (Which W&C don't mention!).

                  The problem is that when the two transforms are entangled in the z[] numbers, there is an unavoidable loss of precision such that "one influences the other".  Each z[j] contains a contribution from each of the input signals.   If both of those contributions are of the same magnitude, you only loose 1 bit of precision, but whereever the two contributions have different magnitudes, the smaller one looses lots of precision.

                  If, for example, the two independant transforms would produce a pair of values of O(2**10) and O(2**1),  when those two quantities are entangled in a single z[j] value, and then seperated, the smaller one has lost 9 bits of precision!