cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Tristan23
Journeyman III

APP-SDK-v2.5: FFT still slower than Apples implementation?

Hello AMD,

in the newest APP-SDK, is the OpenCL implementation still slower than Apples implementation?

cheers,

T.

0 Likes
8 Replies
kknox
Staff

Hi Tristan~

I think that this is going to entirely depend on the size and shape of the FFT's that you are computing.  Up until and including v1.4 of the library, we have been primarily focused on developing a minimal set of useful features and improving the performance of our 1D transforms.  We focuse on 1D performance because it is a primitive that is used for both 2D and 3D FFT's, so they also benefit from this work.  We have improved the performance of our 1D transforms in each release, so if this fits your problem domain you may be happy.

Going forward, we will focus on performance of our 2D and 'large 1D' transforms, as well as adding additional features.  On a Cayman or Cypress device, any 1D FFT of length greater than 2K is currently treated as 'pseudo 2D' FFT's because they are larger than can fit within the devices LDS.  Of course, this number is dependant on the amount of LDS that a device has available and our library does query the device for this amount, so value cards and unannounced cards may change the threshold where a 1D FFT is treated like a 'pseudo 2D' FFT.

It would be nice to get feedback from users on what problem sizes they are interested in, so please reply and leave a message stating the FFT parameters that you are interested in using.  We can incorporate feedback like this into our test and performance suites.

0 Likes

Originally posted by: kknox

 

It would be nice to get feedback from users on what problem sizes they are interested in, so please reply and leave a message stating the FFT parameters that you are interested in using.  We can incorporate feedback like this into our test and performance suites.

 

 

Hi kknox, thanks for asking.

 

I am mostly using complex 32K FFTs, single precision. 1D. Batches of 12.

Card: 6970

0 Likes

Thanks one and all for your replies; it is great to know that 32K is such an important sample length.  We will investigate what we can do.

0 Likes
Raistmer
Adept II

All powers of 2 starting from 8 and ending with 128k of FFT size (especially 32k FFT size), complex samples, batch of 1D FFT transforms.
0 Likes

FFT size 32k, complex floats, batches of 1D

Edit: Plus I also have one special case: Forward FFT, 32k complex floats, where all input data is either -1.0 or +1.0

0 Likes

Tristan23, Raistmer,  can you comment on what you use the 32K point FFT for?

0 Likes

Originally posted by: golgo_13

Tristan23, Raistmer,  can you comment on what you use the 32K point FFT for?



Input complex data of +/-1 is my case too.
I use 32k FFT for AstroPulse application. Also, it's one of sizes in MultiBeam application.
0 Likes

In my case I also use mostly size 2^15 (32k). 1D.

 

Would be great to see some improvement here - since it seems to be a fairly often used case.

0 Likes