cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

anu9anna
Adept I

Multiple fft kernels

Hi ,

         I am using AMD Radeon HD 6770 graphics card for OpenCL application. It has 800 stream core processors. I have installed the clAMDfft library and developed an application for 8000 point fft . I want to execute ten 8000 point ffts parallely on GPU . What would be the best way to execute the kernels on same device? Should I partition the device into subdevices? Or Can I execute the kernels using different command queue? Please suggest the best  method ...

Thanks

0 Likes
1 Solution

Hi,

The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.

View solution in original post

3 Replies
nou
Exemplar

device fission is not supported on GPU at the moment. you can use more queues but it will be serialized. of course one kernel work-items are executed parallel on all 800 cores.

0 Likes

Hi,

The library has the ability to do mutliple transforms efficiently. You can use 'clAmdFftSetPlanBatchSize' and specify a value of 10 for the 'batchSize' parameter. Make sure you setup the input/output buffers correctly, so you allocate space for 10 transforms and fill in the input data. The library then will do all 10 transforms at once as efficiently as possible on the GPU.

Thank You Nou & Bragadeesh .Can you please give any example code for using the batch size for three ffts?

0 Likes