2 Replies Latest reply on Sep 20, 2010 1:40 PM by zeland

    incorect calculation of FLOPS in time_dgemm.f

    zeland

      size of problem for DGEMM in case alfa and beta not equal 1 or 0 is 3*N*K*M but in time_dgemm.f ACML-GPU Linux x64 DNFLOP = 2.0D-6*DBLE(M)*DBLE(N)*DBLE(K)

      I suppose  that right is DNFLOP = 3.0D-6*DBLE(M)*DBLE(N)*DBLE(K)

        • incorect calculation of FLOPS in time_dgemm.f
          chipf

          DGEMM is C= alpha *A*B  + beta * C.  When alpha and beta are applied they each have N*M operations, so you could add 2* N*M to the total flops.  But since this is n squared versus n cubed for the matix multiply operations, it is typically ignored.  If N = 10000, than the extra alpha and beta overhead are only 1/5000 of the operations.

            • incorect calculation of FLOPS in time_dgemm.f
              zeland

              its depends on implementation.


              for int i=1..N

              for int j=1..M

              for int r=1..K

              q[i,j] +=a[i,r]*b[rj] //2*N*M*K (MAD 2 flop)

              end

              c[i,j]=beta*c[i,j]+alfa*q[i,j] //3*N*M (MAD +MUL 3 flop)

              end

              end



              for int i=1..N

              for int j=1..M

              c[i,j]=beta*c[i,j] //N*M (MUL 1 flop)

               for int r=1..K


              c[i,j] +=alfa*a[i,r]*b[rj] //3*N*M*K (MUL +MAD 3 flop)

              end

              end

              end


              So i think You use 2*N*m*K variant.