Hi ,
I am using AMD Radeon HD 6770 graphics card for OpenCL application. It has 800 stream core processors. I have installed the clAMDfft library and developed an application for 8000 point fft . I want to execute ten 8000 point ffts parallely on GPU . What would be the best way to execute the kernels on same device? Should I partition the device into subdevices? Or Can I execute the kernels using different command queue? Please suggest the best method ...
Thanks
Solved! Go to Solution.
Hi,
The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.
device fission is not supported on GPU at the moment. you can use more queues but it will be serialized. of course one kernel work-items are executed parallel on all 800 cores.
Hi,
The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.
Thank You Nou & Bragadeesh .Can you please give any example code for using the batch size for three ffts?