I need to perform FFT on 8192 (complex) doubles, using OpenCL.
Need 128k local memory for a fast implementation. Such GPU exists? Thank you :-).
Unfortunately, you will be limited to 32K local memory.
So, you'll have to modify your algorithm to fit into 32k.
AMD APP SDK has a FFT sample, so you might want to start there.