18 Replies Latest reply on Aug 21, 2013 11:52 PM by msft

    Different results between clAmdFft and fftw.

    msft

      I have different results between clAmdFft and fftw.


      System:

      REDION HD7750

      Ubuntu 12.04 LTS

      clAmdFft-1.10.321.tar.gz


      Code:

      int main()

      {

              OpenCLInit();

              static const size_t N = 32;

              fftw_complex* inputData = (fftw_complex*)fftw_malloc(sizeof(double)*2*N);

              fftw_complex* fftw_fourierData = (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*N);

              fftw_plan c2c_fftw = fftw_plan_dft_1d((int)N, (fftw_complex*) inputData,(fftw_complex*) fftw_fourierData, FFTW_FORWARD,FFTW_ESTIMATE);

              for (size_t x = 0; x < N; ++x)

              {

                      inputData[x][0] = x;

                      inputData[x][1] = x;

              }

              fftw_execute(c2c_fftw);

              printf("fftw   %.15e %.15e \n",fftw_fourierData[1][0],fftw_fourierData[1][1]);

              clAmdFftStatus status;

              EFailedStep failedStep;

              clAmdFftPlanHandle c2c = clAMD_plan_c2c_1d(m_context,N,1,&m_cmdQueue,&status,&failedStep);

              cl_int clstatus;

              cl::Buffer inputBuffer(m_context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(double)*2*N,inputData, &clstatus);

              cl::Buffer fourierBuffer(m_context, CL_MEM_READ_WRITE | CL_MEM_HOST_READ_ONLY, N*sizeof(fftw_complex), NULL, &clstatus);

              status = clAmdFftEnqueueTransform(c2c, CLFFT_FORWARD,1,&m_cmdQueue(),0,NULL,NULL, &inputBuffer(),&fourierBuffer(), NULL);

                      // Read result from device to host.

              fftw_complex fourierData[N];

              clstatus = m_cmdQueue.enqueueReadBuffer(fourierBuffer,CL_TRUE,0,sizeof(fourierData),fourierData);

              printf("AmdFft %.15e %.15e \n",fourierData[1][0],fourierData[1][1]);

              fftw_destroy_plan(c2c_fftw);

              clAmdFftDestroyPlan(&c2c);

              clAmdFftTeardown();

              fftw_free(inputData);

              fftw_free(fftw_fourierData);

              return 0;

      }

       

      Result:

      fftw   -1.784507262017418e+02 1.464507262017418e+02

      AmdFft -1.784507251938066e+02 1.464507243160278e+02

       

      Thank you,

        • Re: Different results between clAmdFft and fftw.
          bragadeesh

          Hi,

           

          The numerical error between the two libraries is in an acceptable range. Minor differences are always expected and it is impossible for two different FFT implementations to produce exact answers.

            • Re: Different results between clAmdFft and fftw.
              msft

              Hi,

              I make Lucas Lamer Test program.

              http://mersenneforum.org/showthread.php?t=18297

              This different make Fatal error .

               

              iPodから送信

               

              2013/06/27 10:59、bragadeesh <developer.forums@amd.com> のメッセージ:

               

               

              AMD Developer Forums

               

              Different results between clAmdFft and fftw.

              in AMD Core Math Library (ACML)

              Hi,

               

               

               

              The numerical error between the two libraries is in an acceptable range. Minor differences are always expected and it is impossible for two different FFT implementations to produce exact answers.

               

              Reply to this message by replying to this email -or- go to the message on AMD Developer Forums

               

              SHARE THE LOVE! Help other developers by marking replies to your questions as Helpful or Correct. If you are the original question asker, you can mark replies as Helpful or Correct by:

              Go to the message on AMD Developer Forums (you’ll need to log in).

              Click on either the Helpful Answer button or the Correct Answer button.

              Pat yourself on the back! You’ve helped others who have the same question as you by telling them which replies are helpful and correct.

               

                • Re: Different results between clAmdFft and fftw.
                  bragadeesh

                  Hi,

                   

                  What value N does it fail for? Is N=40 or N=32 or does it fail irrespective of N? What about N=16?

                  What is your desired tolerance in terms of RMSE? Or can you report the maximum relative error between corresponding numerical values that you are observing?

                   

                  Upon closer inspection I am suspecting a slight numerical issue in the library. But it may not happen for all N. I'll respond once I have a definite answer.

                • Re: Different results between clAmdFft and fftw.
                  Bdot

                  The error magnitude would suggest that the calculations have been performed using single precision.

                   

                  I have not used clAmdFft before, but in the code I have not seen any reference to the required double precision (except for the allocation sizes). Is it necessary to request DP somewhere?

                   

                  If it already was double precision, then an error this big is certainly a problem and definitely an FFT implementation error. (I mean, it would be really bad to have the performance of double precision calculation, but the calculation error is rather single precision)

                    • Re: Different results between clAmdFft and fftw.
                      bragadeesh

                      I am investigating; thanks for your post.

                      • Re: Different results between clAmdFft and fftw.
                        Bdot

                        I found in the docs (revision 1.10):

                        CLAMDFFTAPI clAmdFftStatus clAmdFftSetPlanPrecision (clAmdFftPlanHandle plHandle, clAmdFftPrecision precision)

                         

                        with precision as

                        enum clAmdFftPrecision

                        This is the expected precision of each FFT. Strides and Pitches.

                        Enumerator

                        CLFFT_SINGLE — An array of complex numbers, with real and imaginary

                        components as floats (default).

                        CLFFT_DOUBLE — An array of complex numbers, with real and imaginary

                        components as doubles.

                        CLFFT_SINGLE_FAST — Faster implementation preferred.

                        CLFFT_DOUBLE_FAST — Faster implementation preferred.

                         

                        However, the clAmdFftSetPlanPrecision doc says this (which would be really bad):

                         

                        Currently, only CLFFT_SINGLE and CLFFT_SINGLE_FAST are supported.

                         

                        Whereas the corresponding chapter reads:

                        1.3.3 Supported Precisions in clAmdFft

                        Both CLFFT_SINGLE and CLFFT_DOUBLE precisions are supported by the library

                        for all supported radices. With both of these enums the host computer’s math

                        functions are used to produce tables of sines and cosines for use by the OpenCL

                        kernel.

                        Both CLFFT_SINGLE_FAST and CLFFT_DOUBLE_FAST are meant to generate faster

                        kernels with reduced accuracy, but are disabled in the current build..

                         

                        So the doc contradicts itself ... msft, could you test with adding

                         

                                clAmdFftPlanHandle c2c = clAMD_plan_c2c_1d(m_context,N,1,&m_cmdQueue,&status,&failedStep);

                                cl_int clstatus;

                                clstatus = clAmdFftSetPlanPrecision(c2c, CLFFT_DOUBLE); //(or CLFFT_DOUBLE_FAST)

                                // check clstatus!

                          • Re: Different results between clAmdFft and fftw.
                            bragadeesh

                            Yes thanks for pointing that out, the documentation need to be updated. It will be addressed in the next release update.

                            We support both CLFFT_SINGLE and CLFFT_DOUBLE. From msft's code, I see that he is using double precision. I am waiting for his response on what values of N cause failures.

                              • Re: Re: Different results between clAmdFft and fftw.
                                msft

                                Hi,

                                I was going to travel.

                                I make test program.

                                test length 2097152 to 4194304

                                print first different data.

                                 

                                 

                                0.799$ sh -x ./run.sh

                                + g++ -c main.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.10.321/include/

                                + g++ -c clFFTPlans.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.10.321/include/

                                + g++ main.o clFFTPlans.o /opt/clAmdFft-1.10.321/lib64/libclAmdFft.Runtime.so -lOpenCL -lfftw3

                                + export LD_LIBRARY_PATH=:/opt/clAmdFft-1.10.321/lib64/:/opt/clAmdFft-1.10.321/lib64/

                                + time ./a.out

                                + cat result

                                Using device: Capeverde

                                fftw  N=2097152 x=0 2.199022206976000e+12 2.199022206976000e+12

                                AmdFft N=2097152 x=0 1.048575500000000e+06 1.048575500000000e+06

                                fftw  N=2359296 x=0 2.783137628160000e+12 2.783137628160000e+12

                                AmdFft N=2359296 x=0 1.179647500000000e+06 1.179647500000000e+06

                                fftw  N=2621440 x=0 3.435972526080000e+12 3.435972526080000e+12

                                AmdFft N=2621440 x=0 1.310719500000000e+06 1.310719500000000e+06

                                fftw  N=2949120 x=0 4.348652912640000e+12 4.348652912640000e+12

                                AmdFft N=2949120 x=0 1.474559500000000e+06 1.474559500000000e+06

                                fftw  N=3145728 x=0 4.947800752128000e+12 4.947800752128000e+12

                                AmdFft N=3145728 x=0 1.572863500000000e+06 1.572863500000000e+06

                                fftw  N=3276800 x=0 5.368707481600000e+12 5.368707481600000e+12

                                AmdFft N=3276800 x=0 1.638399500000000e+06 1.638399500000000e+06

                                fftw  N=3538944 x=0 6.262060548096000e+12 6.262060548096000e+12

                                AmdFft N=3538944 x=0 1.769471500000000e+06 1.769471500000000e+06

                                fftw  N=3932160 x=0 7.730939166720000e+12 7.730939166720000e+12

                                AmdFft N=3932160 x=0 1.966079500000000e+06 1.966079500000000e+06

                                fftw  N=4194304 x=0 8.796090925056000e+12 8.796090925056000e+12

                                AmdFft N=4194304 x=0 2.097151500000000e+06 2.097151500000000e+06

                                  • Re: Re: Re: Different results between clAmdFft and fftw.
                                    msft

                                    Sorry 0.799.tar.bz2 have Bug.

                                     

                                     

                                    0.80$ sh -x ./run.sh

                                     

                                     

                                    Using device: Capeverde

                                    fftw  N=2097152 x=1 6.999697936137424e+11 -6.999718907657423e+11

                                    AmdFft N=2097152 x=1 6.999697851429686e+11 -6.999718872570181e+11

                                    fftw  N=2359296 x=1 8.858994174985317e+11 -8.859017767945317e+11

                                    AmdFft N=2359296 x=1 8.858994067777078e+11 -8.859017723538044e+11

                                    fftw  N=2621440 x=1 1.093703130201767e+12 -1.093705751641767e+12

                                    AmdFft N=2621440 x=1 1.093703116966181e+12 -1.093705746159389e+12

                                    fftw  N=2949120 x=1 1.384218208481750e+12 -1.384221157601750e+12

                                    AmdFft N=2949120 x=1 1.384218191730461e+12 -1.384221150663118e+12

                                    fftw  N=3145728 x=1 1.574932822063575e+12 -1.574935967791575e+12

                                    AmdFft N=3145728 x=1 1.574932803004330e+12 -1.574935959896955e+12

                                    fftw  N=3276800 x=1 1.708911550540556e+12 -1.708914827340555e+12

                                    AmdFft N=3276800 x=1 1.708911529859950e+12 -1.708914818774345e+12

                                    fftw  N=3538944 x=1 1.993274574108351e+12 -1.993278113052351e+12

                                    AmdFft N=3538944 x=1 1.993274549986492e+12 -1.993278103060725e+12

                                    fftw  N=3932160 x=1 2.460833025994630e+12 -2.460836958154630e+12

                                    AmdFft N=3932160 x=1 2.460832996214557e+12 -2.460836945819292e+12

                                    fftw  N=4194304 x=1 2.799881271608540e+12 -2.799885465912540e+12

                                    AmdFft N=4194304 x=1 2.799881237725435e+12 -2.799885451877668e+12

                                     

                                     

                                    Accuracy is not enough.