cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

gouse
Journeyman III

clAmdFft: why 3K case is much slower than 4K?

Hi,

I'm experiencing with clAmdFft and see that FFT of 3072x3072 image is way slower than 4096x4096 (ran with 'clAmdFft.Client.exe -g -o -x 4096 -y 4096 -p 20'):

  • 230ms for 3072x3072 image
  • 33ms for 4096x4096

What causes almost 8 times better performance for 1.7 more data? It should be related to 2^n, I guess. But which way?

Thanks.

Tags (2)
0 Kudos
Reply
1 Reply
bragadeesh
Staff
Staff

Re: clAmdFft: why 3K case is much slower than 4K?

Transform length of 4096 (and other pure powers of 2) have been optimized better than lengths with mixed factors (in this case 3072 has a '3' in it). Lengths with mixed factors need more optimization work in the library.