3 Replies Latest reply on Sep 9, 2011 8:21 PM by antzrhere

    Documentation clarification

    rick.weber

      In the APP guide on page 1-12, the following paragraph makes no sense:

      "When using these interfaces, it is important to consider the amount of copying involved. There is a two-copy processes: between host and PCIe, and between PCIe and GPU compute device. This is why there is a large performance difference between the system GFLOPS and the kernel GFLOPS."

      What does data transfer latency have to do with floating point performance?

        • Documentation clarification
          MicahVillmow
          system GFLOPS is the performance of the entire application including memory transfer times in FLOPS.
          kernel GFLOPS is the performance of the kernel itself in FLOPS.
            • Documentation clarification
              rick.weber

              Oh, so this basically means if I do

              copy

              kernel

              copy

              the time spend doing the whole thing is longer than doing just the kernel. I was thrown of by the use of the specific term "FLOPs" when memory transfers have nothing to do with floating point operations.

                • Documentation clarification
                  antzrhere

                  Yes it is confusing, however quite correct. Actual FLOPs (as opposed to theoretical FLOPs) is a measurement of math work completed per unit of time (measured in seconds, but not for A second). Unlike theoretical FLOPs, it's a snapshot, and depending at what point you take this 'snapshot' it can range from theoretical FLOPs (Max) to 0 FLOPs (i.e. start/end period=single memory read operation with no calculation). System FLOPs should be determined within a period of time that encompases atleast one cycle of all applied operations (such as copy, calculate etc.), with a better approximation at T approaches infinity.

                  What a load of pointless waffle!