I'm experiencing with clAmdFft and see that FFT of 3072x3072 image is way slower than 4096x4096 (ran with 'clAmdFft.Client.exe -g -o -x 4096 -y 4096 -p 20'):
- 230ms for 3072x3072 image
- 33ms for 4096x4096
What causes almost 8 times better performance for 1.7 more data? It should be related to 2^n, I guess. But which way?