Not sure how but I got past this issue. I will post the details if I can reconstruct the mistake.
I have followup questions on performance and functionality.
Functionality: I modified function heuristic in fft_real.lua and fft_complex.lua to return true always, and I did that on both AMD cards I have on two different systems; "Hawaii", "Default" (Turks). My intent was to just blindly force it on.
* Are there other options, or would that cover all functionality of AMD's FFTW interface?
* Now that I have it working, I'm not sure what to do to to force GPU off entirely. I changed the heuristic to just return false but I can see the GPU still lights up (using afterburner to watch it). I haven't profiled yet to see if real kernels are running though in this case. When I did the same thing on the turks card, it always crashes.
* is there a way to save the GPU code acml generates and get it to reload on a subsequent invocation? It takes a full second to compile.
Performance. It's about 100% slower so far vs fftw 3.3.4. With CodeXL I can see there are a huge number of buffer transfers intermingled with all the kernels. Is this typical of this FFTW interface? I'll be trying to pass in some larger datasets. I'm getting the feeling I should perhaps work with clFFT directly.
Note on the "acml_bridge" crash I had seen earlier. This seems to happen if "resources" isn't a child folder of the folder containing all the ACML DLLs. It seems that not having the "resources" directory (normally found under ifort64\lib) could lead to this. In my original configuration I had copied the ACML DLLs plus some other things to a completely separate folder and added that to my PATH, but I had not included the "resources" folder. This is probably why it happened to me in the first place. I had solved it by putting the full path to the ACML dlls into the PATH, but ran into the crash again after an editing mistake.
I would still like to know if there is a way to force GPU off entirely. I'm not totally sure what it's doing each time. The results I got on Turks did not match at all what I got from fftw3. I will probably just try an older ACML w/o GPU stuff.
To give a brief answer, the files fft_complex.lua and fft_real.lua has the control logic to direct execution on the GPU or the CPU. FFT computation on CPU is performed using the fftw library, which the user has to obtain and install. In function 'heuristic' in the lua files, if you made a blanket 'return false', then all execution will be performed on the CPU.
Unfortunately I didn't see this at the time you posted the reply. I'm glad you posted a reply, but must point out it was rather late response and we basically had to set this whole thing aside. Our current efforts had moved to use the the CUDA fftw.
What you wrote didn't address any of the problems I was encountering, such as the fact that 'return false' was seemingly *not* doing what you say it should do.
If I understand correctly, this is all open source now, which is a strong reason to tackle this again in the future.