Archives Discussions

BarsMonster · ‎03-07-2009

Hi, I've got excellent performance of my application on Brook platform, but very first brook kernel takes ~4000ms.

kernel function itself have around 25 streams, and amount of data transferred each call is around 1Mb. 2nd and later calls are fast & perfect (~90ms).

Any clues?

Right now I have to measure performance of my application, so I have to do first "dummy" call , and bench second :-S

gaurav_garg · ‎03-07-2009

Brook+ kernel call implementation implements various caches. That's why you see a speed-up from second kernel call. There is no way you can avoid first slow kernel call.

empty_knapsack · ‎03-07-2009

Probably these 4000 ms just eaten by calclCompile routine. For large kernels compiling speed become a real problem. Try to grab your kernel code from brook+ *.cpp code and compile it alone to figure out.

BarsMonster · ‎03-08-2009

You right, I have quite huge kernel. This kinda sucks.

CUDA port gets executed almost instantly :-S

Thanks for your replies.

gsteri1 · ‎09-03-2009

I ran the black scholes example in the brook directory. It seems to indicate that the GPU calculations are slower than the ones on the CPU? Has anyone seen this or is something misconfigured on my box (openSUSE AMD64/ATI 4850 HD).

Thanks,

-Greg

riza_guntur · ‎09-03-2009

use greater input, I've seen improvement on 1 million up to 3 million input samples for black scholes

gsteri1 · ‎09-05-2009

Yes, I do see the speed up now. Thank you. -Greg

gsteri1 · ‎09-03-2009

I ran the example up to 200k replications. I will try what you suggest.

Thank you,

-Greg

riza_guntur · ‎09-03-2009

Originally posted by: BarsMonster You right, I have quite huge kernel. This kinda sucks.

CUDA port gets executed almost instantly :-S
Thanks for your replies.

CUDA has offline kernel compilation. I'm not sure why no one included it in Brook+ compilation step, must be because forward-compatibility

Archives Discussions

First Brook call is sloooooooowww