AnsweredAssumed Answered

clAmdFft: Small differences in r2c transformation compared to fftw?

Question asked by gugi on Jan 14, 2013
Latest reply on Jan 15, 2013 by gugi

Hi everyone,

I am doing some real-to-complex transformations using the clAmdFft library, version 1.8.239, with double precision enabled.

When I compare the OpenCL output with the results from the fftw library (version 3.3.2) I almost get the same values, but not exactly.

 

Heres the output of a 16x16 r2c transformation of a cosine using fftw:

-8.411709e-020+0.000000e+000i    -1.262446e-019+1.556167e-018i    -1.532043e-018+-1.570092e-019i    -7.150663e-019+-8.849248e-019i    1.000000e-003+-2.671338e-018i    7.150663e-019+-8.849248e-019i    1.532043e-018+-1.570092e-019i    1.262446e-019+1.556167e-018i    8.411709e-020+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    

 

And heres the output of clAmdFft (either on CPU or GPU, doesn't matter, its the same):

-8.411709e-020+0.000000e+000i    -1.262446e-019+1.556167e-018i    -1.532043e-018+-1.570092e-019i    -7.150663e-019+-8.849248e-019i    1.000000e-003+-2.602085e-018i    7.150663e-019+-8.849248e-019i    1.532043e-018+-1.570092e-019i    1.262446e-019+1.556167e-018i    8.411709e-020+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    9.629650e-035+-1.925930e-034i    9.629650e-035+4.814825e-035i    9.629650e-035+1.925930e-034i    0.000000e+000+0.000000e+000i    -9.629650e-035+1.925930e-034i    -9.629650e-035+4.814825e-035i    -9.629650e-035+-1.925930e-034i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    
0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    0.000000e+000+0.000000e+000i    

 

 

As you can see the results match quite well. However there are some non-zero elements in the 9'th line (Nyquist frequency) and some of the values in the first line are also different. I know, the deviations are quite small, but I observed some notable differences in my simulation for larger transformations (1024x1024 for example). Plus I was wondering if this might be simply a bug.

 

I.e. my question is: Is that some sort of bug, or am I doing anything wrong, or is this simply an effect of the applied algorithm?

 

 

(I attached an example source code, which reproduces this behaviour. It creates the files fft_fftw.dat and fft_cl.dat which contain the above posted results. Naturally it requires fftw, OpenCL and clAmdFft to build.)

System specs: Win7 64, Visual Studio 2010 C++ x64 Build, Intel Q6600, AMD HD5850 (latest beta-driver), OpenCL 1.2 AMD-APP (1084.2)

Attachments

Outcomes