Archives Discussions

anu9anna · ‎12-11-2012

Hi ,

I am using AMD Radeon HD 6770 graphics card for OpenCL application. It has 800 stream core processors. I have installed the clAMDfft library and developed an application for 8000 point fft . I want to execute ten 8000 point ffts parallely on GPU . What would be the best way to execute the kernels on same device? Should I partition the device into subdevices? Or Can I execute the kernels using different command queue? Please suggest the best method ...

Thanks

bragadeesh · ‎12-11-2012

Hi,

The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.

View solution in original post

nou · ‎12-11-2012

device fission is not supported on GPU at the moment. you can use more queues but it will be serialized. of course one kernel work-items are executed parallel on all 800 cores.

bragadeesh · ‎12-11-2012

Hi,

The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.

anu9anna · ‎12-11-2012

Thank You Nou & Bragadeesh .Can you please give any example code for using the batch size for three ffts?

Archives Discussions

Multiple fft kernels