AnsweredAssumed Answered

AMD 6276 with ACML library has lower GFLOPS?

Question asked by chandra on Jan 23, 2013
Latest reply on Feb 18, 2013 by chandra

We just got a new cluster with AMD 6276 CPU with 16 cores.

We link the ACML library 5.1.0 with  -L/sopt/acml5.1.0/ifort64_fma4_mp_int64/lib -lacml_mp

And use g++ to compile our test program, which is the multiplication of real matrices use dgemm with double precision.

 

However for 16 cores, we only have 60Glops, which is 4 times slower than the theoretical GLOPS of AMD 6276.

AMD 6276 suppose to have FMA4 instruction, so i check some web info that it should have 8 double precision/per clock (DP/clock).

Theoretically it should have 16 cores * 2.3G (frequency) * 8 DP/clock *0.85 efficiency~250Glops.

It seems like that our CPU only has 2 DP/per clock, similar to AMD 6275 CPU.

But I cat /proc/CPU and see that it do says AMD 6276 CPU.

 

When I test on intel sandy bridge CPU with intel MKL library, it has 8DP/clock. For AMD MagnyCours 2.1G with acml library, I have 4DP/clock.

These are as expected. But for AMD 6276, it is 4 times smaller than expected value.

Thus I am wondering if we compile the program correctly.

I am using ifort compiled acml library with g++ to compile our program. Should I use gfortran compiled acml library or anything else.

Thanks anyone for your comments.

Outcomes