4 Replies Latest reply on Nov 13, 2015 10:55 AM by gpgpucoder

    trouble getting ACML fftw wrapper working




      I'm having trouble getting the ACML fftw wrapper working. I've linked ACML into a Windows 64-bit project today in VStudio 2012 with the intent of trying the FFTW wrappers on GPU.


      The returned plan from fftw_plan_dft_r2c_2d seems to return zero. Subsequent fftw_execute gives me a crash in acml_bridge.  I had put all the DLLs including the ACML and libfftw-3.3.4.dll \ libfftwf-3.3.4.dll into c:\mydlls and added that to my PATH. So the program loads ok but doesn't work right.


      What might I be doing wrong?


      edited in minor tidbits:

      1) also had meant to ask, are there not to be 32-bit versions of acml? It seems it was dropped at some point. But I can't easily find any notes about that.

      2) Below URL in the  ACML package "ViewKnowledgeBase.url" gives "404 Ooops! Page not found":





        • Re: trouble getting ACML fftw wrapper working

          Not sure how but I got past this issue. I will post the details if I can reconstruct the mistake.


          I have followup questions on performance and functionality.


          Functionality: I modified function heuristic in fft_real.lua and fft_complex.lua to return true always, and I did that on both AMD cards I have on two different systems; "Hawaii", "Default" (Turks). My intent was to just blindly force it on.

          * Are there other options, or would that cover all functionality of AMD's FFTW interface? 

          * Now that I have it working, I'm not sure what to do to to force GPU off entirely.  I changed the heuristic to just return false but I can see the GPU still lights up (using afterburner to watch it). I haven't profiled yet to see if real kernels are running though in this case. When I did the same thing on the turks card, it always crashes.

          * is there a way to save the GPU code acml generates and get it to reload on a subsequent invocation? It takes a full second to compile.


          Performance. It's about 100% slower so far vs fftw 3.3.4. With CodeXL I can see there are a huge number of buffer transfers intermingled with all the kernels. Is this typical of this FFTW interface? I'll be trying to pass in some larger datasets. I'm getting the feeling I should perhaps work with clFFT directly.

            • Re: trouble getting ACML fftw wrapper working

              Note on the "acml_bridge" crash I had seen earlier. This seems to happen if "resources" isn't a child folder of the folder containing all the ACML DLLs. It seems that not having the "resources" directory (normally found under ifort64\lib) could lead to this. In my original configuration I had copied the ACML DLLs plus some other things to a completely separate folder and added that to my PATH, but I had not included the "resources" folder. This is probably why it happened to me in the first place. I had solved it by putting the full path to the ACML dlls into the PATH, but ran into the crash again after an editing mistake.


              I would still like to know if there is a way to force GPU off entirely. I'm not totally sure what it's doing each time. The results I got on Turks did not match at all what I got from fftw3. I will probably just try an older ACML w/o GPU stuff.

                • Re: trouble getting ACML fftw wrapper working

                  To give a brief answer, the files fft_complex.lua and fft_real.lua has the control logic to direct execution on the GPU or the CPU. FFT computation on CPU is performed using the fftw library, which the user has to obtain and install. In function 'heuristic' in the lua files, if you made a blanket 'return false', then all execution will be performed on the CPU.

                    • Re: trouble getting ACML fftw wrapper working

                      Unfortunately I didn't see this at the time you posted the reply. I'm glad you posted a reply, but must point out it was rather late response and we basically had to set this whole thing aside.  Our current efforts had moved to use the the CUDA fftw.


                      What you wrote didn't address any of the problems I was encountering, such as the fact that 'return false' was seemingly *not* doing what you say it should do.


                      If I understand correctly, this is all open source now, which is a strong reason to tackle this again in the future.