canyouseeme

why floating point performance of TR 1950x is so volatile and why 8 double operations/cycle rather than 16 double operations/cycle?

Discussion created by canyouseeme on May 15, 2018
Latest reply on Jun 5, 2018 by canyouseeme

The figure below is the floating point performance vs time. The x axis is time, y axis is GFLOPS, y = 2*n^3/time consumed for BLAS function dgemm(). I have tested serval processors. The performance of TR1950x(the three upper curves) is obviously fluctuant while others(the other three curves) are relatively stable. So what's the reason?

i5_7500_i7_8550U_TR1950x_dgemm_gflops_cmp.png

 

 

Another question. The peak GFLOPS of TR 1950x@3.75G is 480G(AIDA64), 3.75G*16cores*8ops=480G, my best result is 380G(average),414G(max) in dgemm().Ryzen can only do one 256bit FMA per cycle(throughput). It's only the half of intel's solution. So why? cost,power consumption or heat?

 

Windows 10. OpenBLAS v0.2.19, OPENBLAS_CORETYPE=EXCAVATOR.

Outcomes