I have tested the performance of FFT using ACML-GPU library, but I found that the performance of FFT running on ATI is very poor.
For instance, while doing FFT on 2M float complex data, it takes about 140ms, however, I have also tested the performance of FFT running on nVidia card. the same size running on nVidia will only take 39ms. Also, the price of my nVidia card is much cheaper than the ATI card.
So, is it because the FFT of ACML-GPU is not accelerated?
You are correct. FFT is not gpu-accellerated in ACMLGPU 1.0