cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

canyouseeme
Adept I

why floating point performance of TR 1950x is so volatile and why 8 double operations/cycle rather than 16 double operations/cycle?

The figure below is the floating point performance vs time. The x axis is time, y axis is GFLOPS, y = 2*n^3/time consumed for BLAS function dgemm(). I have tested serval processors. The performance of TR1950x(the three upper curves) is obviously fluctuant while others(the other three curves) are relatively stable. So what's the reason?

i5_7500_i7_8550U_TR1950x_dgemm_gflops_cmp.png

Another question. The peak GFLOPS of TR 1950x@3.75G is 480G(AIDA64), 3.75G*16cores*8ops=480G, my best result is 380G(average),414G(max) in dgemm().Ryzen can only do one 256bit FMA per cycle(throughput). It's only the half of intel's solution. So why? cost,power consumption or heat?

Windows 10. OpenBLAS v0.2.19, OPENBLAS_CORETYPE=EXCAVATOR.

0 Likes
1 Reply
canyouseeme
Adept I

Maybe it's because of 4k aliasing(in intel's word).

On windows, set LDC(leading dimension of (matrix) C) =4096+64 rather than 4096. the performance of TR 1950X become much more stable.

But on linux(kali), the performance is always stable no matter LDC can be divided by 4K or not, except sharp slopes every few minutes.

I'm a little confused now...

0 Likes