I am using your clAmdFft APP library to do a 32K FFT. Unfortunately performance is not as good as with Apples OpenCL FFT implementation (http://developer.apple.com/library/mac/#samplecode/OpenCL_FFT/Introduction/Intro.html).
I am using the default settings / default parameters to call your API functions. Is there a way to speed things up a bit? Which settings would you recommend?