6 Replies Latest reply on Sep 15, 2008 1:42 PM by MicahVillmow

    Some initial simple_matmult results using SDK 1.2 + an 4870?

    jski

      Using (with a Radeon 4870):

      ./simple_matmult_d -x 1000 -y 1000 -i 100 -t -p 

      I got:

      Width   Height  Iterations   CPU Total Time  GPU Total Time 

      1000    1000      100             876.598000        10.756000  

      Gflops           Speedup

      17.317266     81.498512

      Are these numbers consistent with other's results? (i.e., Is this what I should expect?)

      ---jski

        • Some initial simple_matmult results using SDK 1.2 + an 4870?
          jski

          Results using (double precision):

          ./double_precision_simple_matmult_d -x 1000 -y 1000 -i 100 -p -t

          Width   Height  Iterations      CPU Total Time  GPU Total Time

          1000    1000      100                1139.733000        12.64600

          Gflops          Speedup

          14.729125     90.125969

          This runs counter to what I expected (less of a speed up) so I have to assume that AMD made some architectural changes (to the GPU) to improve double precision performance.

          ---jski

          • Some initial simple_matmult results using SDK 1.2 + an 4870?
            moodz

            HD4870

            Q6600 @ 3.419 Ghz

            ./simple_matmult_d -x 1000 -y 1000 -i 10 -t -p

            Width   Height  Iterations      GPU Total Time  Gflops         
            1000    1000    10              1.027000        18.136759      

            -p Compare performance with CPU.
            Width   Height  Iterations      CPU Total Time  GPU Total Time  Gflops          Speedup        
            1000    1000    10              71.004000       1.027000        18.136759       69.137293

            I couldnt wait 15 minutes for the CPU test !

            • Some initial simple_matmult results using SDK 1.2 + an 4870?
              MicahVillmow
              moods, a kernel with a large amount of arithmetic operations and not many memory accesses will generate this.