The figure below is the floating point performance vs time. The x axis is time, y axis is GFLOPS, y = 2*n^3/time consumed for BLAS function dgemm(). I have tested serval processors. The performance of TR1950x(the three upper curves) is obviously fluctuant while others(the other three curves) are relatively stable. So what's the reason?
![i5_7500_i7_8550U_TR1950x_dgemm_gflops_cmp.png i5_7500_i7_8550U_TR1950x_dgemm_gflops_cmp.png](/t5/image/serverpage/image-id/10442i3DBE3889F5F3C826/image-size/large?v=v2&px=999)
Another question. The peak GFLOPS of TR 1950x@3.75G is 480G(AIDA64), 3.75G*16cores*8ops=480G, my best result is 380G(average),414G(max) in dgemm().Ryzen can only do one 256bit FMA per cycle(throughput). It's only the half of intel's solution. So why? cost,power consumption or heat?
Windows 10. OpenBLAS v0.2.19, OPENBLAS_CORETYPE=EXCAVATOR.