cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

How costly kernel call ?

How many CPU cycles takes to prepare kernel launch?...

I use pretty simple kernels but call them many times in program.

CPU backend performance of Brook version worse than pure CPU version, but CAL backend performance even worse!

Performane degrades in many folds when running on GPU.  (Both elapsed and CPU times)

I use HD4870 for benchmarking, not slowest one, so such result pretty discouraging.

When I added RDTSC-based counters to see what kernel took longest time it appeared that all counters returns approx same mean ticks value no matter what of kernels is running.

It could lead to conclusion that actual running time of my simple kernels is very low and totally hided in kernel run preparation that took vast majority of running time.

So, the question is - does some info what CPU time takes very simple (for example stream A + stream B) kernel call available ?

What is recommended kernel length to be useful (to decreas app running time instead of increasing it) ?

 

0 Likes
4 Replies
sambucuself
Journeyman III

I think that actually calling kernel and all the steps neccesery to perform that operation are very CPU time ineffective if the kernel is "too short" or the field of execution (the domain) is too small.

You should try working with relatively large streams and perform as much calclucation as you can with as few memory operations so that you avoid bottlenecks.

 

I'm working on some technical calculations related stream kernel programming and those are my conclusions.

0 Likes

Originally posted by: sambucuself I think that actually calling kernel and all the steps neccesery to perform that operation are very CPU time ineffective if the kernel is "too short" or the field of execution (the domain) is too small.

You should try working with relatively large streams and perform as much calclucation as you can with as few memory operations so that you avoid bottlenecks.

 

I'm working on some technical calculations related stream kernel programming and those are my conclusions.

Yes, but maybe some number estimates?

Stream (domain) size restricted by size of data array processed, sometimes it prety small... Will try to enlarge kernel itself.

0 Likes
Gipsel
Adept I

Originally posted by: Raistmer

I use pretty simple kernels but call them many times in program.


That's inherently bad   Simple kernels are never compute bound and the calling overhead will kill the performance.

Originally posted by: Raistmer

When I added RDTSC-based counters to see what kernel took longest time it appeared that all counters returns approx same mean ticks value no matter what of kernels is running.

It could lead to conclusion that actual running time of my simple kernels is very low and totally hided in kernel run preparation that took vast majority of running time.

So, the question is - does some info what CPU time takes very simple (for example stream A + stream B) kernel call available ?

What is recommended kernel length to be useful (to decreas app running time instead of increasing it) ?



I've seen somewhere a number of about 20µs overhead per kernel call, but I guess it was for the CAL interface, I never measured it. The Brook+ layer will add a bit on top of it. I try to have kernels that need some milliseconds (or several tens of ms). Copying a lot of stuff to the GPU before and back after a kernel also cause some major slowdown. It's better to let all results in the GPU memory if possible.

Such a simple kernel of adding two arrays is only useful as an intermediate step between complex kernels (copying the arrays to the GPU just to add it there is definitely slower than to do it on the CPU). If possible, one should integrate such things into the kernel before or after it.

0 Likes

Actually all data reside in GPU already, just many data transfers inside GPU memory. Enlarging of one of kernels (put loop inside kernel instead of calling it in loop) already gave big performance boost. It seems it's way to go

 

0 Likes