AnsweredAssumed Answered

Problem with large FFTs using clAmdFft

Question asked by sadrian on Apr 12, 2012
Latest reply on May 30, 2012 by kbrafford

I am using the clAmdFftClient-1.6.244 with the –d and –o options to generate an out of place kernel with a default fft size of 1024, and a single file called clAmdFft.kernel.Stockham1.cl is output.  Both the forward and reverse FFTs produce what I expect, and the workgroup size controls the batch (# of FFTs processed per kernel invocation). On a large buffer I measure a throughput of 20000 MB/s (2500 MS/s) on a gpu and a throughput of 560 MB/s (70 MS/s) on 12 cpus. Except for the low performance on the cpus, everything seems to be OK with the 1024 point FFT.

 

For larger FFTs, starting with 16384, the kernel generator writes two files, clAmdFft.kernel.Stockham2.cl and clAmdFft.kernel.Stockham3.cl. Neither kernel gives the output I expect. I tried operating with one followed by the other in case it is supposed to be a two-stage calculation, but I still did not get a correct answer. Can anyone shed some light on this?

 

Outcomes