This content has been marked as final. Show 2 replies
The hardware reduction buffer is not used by brook. Reduction is done by implementing a multiple pass algorithm and at every step reducing it by a factor between 2-8 to shrink the streams to the requested size. Because reductions in brook are written to be as generic as possible, the easiest way to increase performance is to write your own reduction implementation that is tailored specifically for your applications needs.