cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Documentation clarification

In the APP guide on page 1-12, the following paragraph makes no sense:

"When using these interfaces, it is important to consider the amount of copying involved. There is a two-copy processes: between host and PCIe, and between PCIe and GPU compute device. This is why there is a large performance difference between the system GFLOPS and the kernel GFLOPS."

What does data transfer latency have to do with floating point performance?

0 Likes
3 Replies

system GFLOPS is the performance of the entire application including memory transfer times in FLOPS.
kernel GFLOPS is the performance of the kernel itself in FLOPS.
0 Likes

Oh, so this basically means if I do

copy

kernel

copy

the time spend doing the whole thing is longer than doing just the kernel. I was thrown of by the use of the specific term "FLOPs" when memory transfers have nothing to do with floating point operations.

0 Likes

Yes it is confusing, however quite correct. Actual FLOPs (as opposed to theoretical FLOPs) is a measurement of math work completed per unit of time (measured in seconds, but not for A second). Unlike theoretical FLOPs, it's a snapshot, and depending at what point you take this 'snapshot' it can range from theoretical FLOPs (Max) to 0 FLOPs (i.e. start/end period=single memory read operation with no calculation). System FLOPs should be determined within a period of time that encompases atleast one cycle of all applied operations (such as copy, calculate etc.), with a better approximation at T approaches infinity.

What a load of pointless waffle!

0 Likes