cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Documentation clarification

In the APP guide on page 1-12, the following paragraph makes no sense:

"When using these interfaces, it is important to consider the amount of copying involved. There is a two-copy processes: between host and PCIe, and between PCIe and GPU compute device. This is why there is a large performance difference between the system GFLOPS and the kernel GFLOPS."

What does data transfer latency have to do with floating point performance?

Tags (1)
0 Likes
3 Replies
MicahVillmow
Staff
Staff

Documentation clarification

system GFLOPS is the performance of the entire application including memory transfer times in FLOPS.
kernel GFLOPS is the performance of the kernel itself in FLOPS.
0 Likes
rick_weber
Adept II

Documentation clarification

Oh, so this basically means if I do

copy

kernel

copy

the time spend doing the whole thing is longer than doing just the kernel. I was thrown of by the use of the specific term "FLOPs" when memory transfers have nothing to do with floating point operations.

0 Likes
antzrhere
Adept III

Documentation clarification

Yes it is confusing, however quite correct. Actual FLOPs (as opposed to theoretical FLOPs) is a measurement of math work completed per unit of time (measured in seconds, but not for A second). Unlike theoretical FLOPs, it's a snapshot, and depending at what point you take this 'snapshot' it can range from theoretical FLOPs (Max) to 0 FLOPs (i.e. start/end period=single memory read operation with no calculation). System FLOPs should be determined within a period of time that encompases atleast one cycle of all applied operations (such as copy, calculate etc.), with a better approximation at T approaches infinity.

What a load of pointless waffle!

0 Likes