2 Replies Latest reply on Jan 15, 2013 12:46 PM by gugi

    clAmdFft: Small differences in r2c transformation compared to fftw?

    gugi

      Hi everyone,

      I am doing some real-to-complex transformations using the clAmdFft library, version 1.8.239, with double precision enabled.

      When I compare the OpenCL output with the results from the fftw library (version 3.3.2) I almost get the same values, but not exactly.

       

      Heres the output of a 16x16 r2c transformation of a cosine using fftw:

      -8.411709e-020+0.000000e+000i    -1.262446e-019+1.556167e-018i    -1.532043e-018+-1.570092e-019i    -7.150663e-019+-8.849248e-019i    1.000000e-003+-2.671338e-018i    7.150663e-019+-8.849248e-019i    1.532043e-018+-1.570092e-019i    1.262446e-019+1.556167e-018i    8.411709e-020+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      

       

      And heres the output of clAmdFft (either on CPU or GPU, doesn't matter, its the same):

      -8.411709e-020+0.000000e+000i    -1.262446e-019+1.556167e-018i    -1.532043e-018+-1.570092e-019i    -7.150663e-019+-8.849248e-019i    1.000000e-003+-2.602085e-018i    7.150663e-019+-8.849248e-019i    1.532043e-018+-1.570092e-019i    1.262446e-019+1.556167e-018i    8.411709e-020+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    9.629650e-035+-1.925930e-034i    9.629650e-035+4.814825e-035i    9.629650e-035+1.925930e-034i    0.000000e+000+0.000000e+000i    -9.629650e-035+1.925930e-034i    -9.629650e-035+4.814825e-035i    -9.629650e-035+-1.925930e-034i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
      

       

       

      As you can see the results match quite well. However there are some non-zero elements in the 9'th line (Nyquist frequency) and some of the values in the first line are also different. I know, the deviations are quite small, but I observed some notable differences in my simulation for larger transformations (1024x1024 for example). Plus I was wondering if this might be simply a bug.

       

      I.e. my question is: Is that some sort of bug, or am I doing anything wrong, or is this simply an effect of the applied algorithm?

       

       

      (I attached an example source code, which reproduces this behaviour. It creates the files fft_fftw.dat and fft_cl.dat which contain the above posted results. Naturally it requires fftw, OpenCL and clAmdFft to build.)

      System specs: Win7 64, Visual Studio 2010 C++ x64 Build, Intel Q6600, AMD HD5850 (latest beta-driver), OpenCL 1.2 AMD-APP (1084.2)

        • Re: clAmdFft: Small differences in r2c transformation compared to fftw?
          bragadeesh

          Hi,

           

          Small numerical differences between FFT libraries are quite common. You can also compute RMS error to get a quick idea of the overall differences. If the difference is below reasonable tolernaces, then we conclude that it is not a bug. If it reaches high unacceptable levels, we'll have to investigate it.

           

          Mathemtical problems/algorithms in general have to account for these minor numerical deviations when processing the results. Have you made a more detailed assessement of the usability of the results? If after a closer assessment, you still think that the results are too different, let us know. An attached repro code would be better.

           

          Could I ask what kind of simulation problems you are running? What kind of applications are you trying to use the library for?

          1 of 1 people found this helpful
            • Re: clAmdFft: Small differences in r2c transformation compared to fftw?
              gugi

              Thank you for your answer.

              I already thought that it is not unusual to get minor differences between different fft libraries, but I just wanted to make sure.

               

              The simulation is actually an integration of a PDE (a modified 2D-Swift-Hohenberg equation to be precise) using the pseudospectral approach (basically some fourier transformations and elementwise matrix multiplications). I normally  need max. 256x256 matrices, mostly smaller ones and the results look OK if I compare them to a pure C++ implementation. I observed that if I crank up the number to like 1024x1024 then I get some "noise" around the Nyquist and the 0 frequency noticeable in a log-plot of the fourier-domain, although its still way smaller than the maximal values (however its not there in the fftw version).

              Its no problem at the moment. I'll have a closer look, though, and if I can trace it back to the minor differences I'll post again.