A good place to start is to examine the OpenCL FFT example in the AMD SDK available at AMD developer central: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
You can even take the OpenCL C portion of the sample and compile it into HSAIL using the CLOC tool, modifying the HSAIL, if needed. Here is a link to the CLOC tool. HSAFoundation/CLOC · GitHub
All that is left is writing the runtime portion of the code. The OpenCL FFT sample should be of some use here as well.