cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

delgadom
Journeyman III

Performance of reduction kernel

I am benchmarking the AMD stream SDK v1.1 with my Radeon HD2400 using thesamples shipped with brook+. Performance is awesone except for the tests using the reduction kernels. For example, setting

BRT_RUNTIME=cpu

and the call to the compiled example reduction.br as

./reduction -t -x 1024 -y 1024 -i 100

results in a execution time of 0.024000s. However with the same call after seeting

BRT_RUNTIME=cal

the time increases to 1.218000s, and the load in the CPU is not negligible.

Is the reduction buffer implemented for my Radeon? If yes, do you have any idea of how to increase the performance of the reduction?

Carlos

 

0 Likes
2 Replies

Delgadom,
The hardware reduction buffer is not used by brook. Reduction is done by implementing a multiple pass algorithm and at every step reducing it by a factor between 2-8 to shrink the streams to the requested size. Because reductions in brook are written to be as generic as possible, the easiest way to increase performance is to write your own reduction implementation that is tailored specifically for your applications needs.

0 Likes

Thank you!

0 Likes