Slow Reduction Kernel

Discussion created by dinaharchery on Sep 12, 2009
Latest reply on Sep 14, 2009 by Gipsel

Please help,

I apologize that this topic is so similiar to the other that I posted but this is a very specific question I hope someone who has maybe ran into the same problem could help me with. It is driving me insane and I am hoping that it is just a simple issue us newbies run into.

I am implementing a matrix-vector multiplication operation (similar to the one included with Brook+ samples) and everything seems to work great except a large bottleneck at the reduction kernel. Is there a way to speed up the reduction kernel or maybe I should create my own? And if so, how (hints/ideas - both my code and I are slow)?

Relevent code is attached. Thank you to anyone with any ideas or simple code example(s)

// Call Kernel(s): gatherMult(aStrm, xStrm, indices, tmpMat); // THIS IS THE SLOW-DOWN ====> The "REDUCTION" Kernel sumRows(tmpMat, yStrm); kernel void gatherMult(float a<>, float b[], float index<>, out float result<>) { result = a*b[index]; } reduce void sumRows(float nzValues<>, reduce float result<>) { result += nzValues; }