in the newest APP-SDK, is the OpenCL implementation still slower than Apples implementation?
I think that this is going to entirely depend on the size and shape of the FFT's that you are computing. Up until and including v1.4 of the library, we have been primarily focused on developing a minimal set of useful features and improving the performance of our 1D transforms. We focuse on 1D performance because it is a primitive that is used for both 2D and 3D FFT's, so they also benefit from this work. We have improved the performance of our 1D transforms in each release, so if this fits your problem domain you may be happy.
Going forward, we will focus on performance of our 2D and 'large 1D' transforms, as well as adding additional features. On a Cayman or Cypress device, any 1D FFT of length greater than 2K is currently treated as 'pseudo 2D' FFT's because they are larger than can fit within the devices LDS. Of course, this number is dependant on the amount of LDS that a device has available and our library does query the device for this amount, so value cards and unannounced cards may change the threshold where a 1D FFT is treated like a 'pseudo 2D' FFT.
It would be nice to get feedback from users on what problem sizes they are interested in, so please reply and leave a message stating the FFT parameters that you are interested in using. We can incorporate feedback like this into our test and performance suites.
Originally posted by: kknox It would be nice to get feedback from users on what problem sizes they are interested in, so please reply and leave a message stating the FFT parameters that you are interested in using. We can incorporate feedback like this into our test and performance suites.
Hi kknox, thanks for asking.
I am mostly using complex 32K FFTs, single precision. 1D. Batches of 12.
Thanks one and all for your replies; it is great to know that 32K is such an important sample length. We will investigate what we can do.
FFT size 32k, complex floats, batches of 1D
Edit: Plus I also have one special case: Forward FFT, 32k complex floats, where all input data is either -1.0 or +1.0
Tristan23, Raistmer, can you comment on what you use the 32K point FFT for?
Originally posted by: golgo_13Tristan23, Raistmer, can you comment on what you use the 32K point FFT for?
In my case I also use mostly size 2^15 (32k). 1D.
Would be great to see some improvement here - since it seems to be a fairly often used case.
Retrieving data ...