I built a new quad GPU system with AMD Threadripper 1950x. But to my surprise, the numpy linear algebra performance is extremely slow (2x slow than even i7-6700k ...), not even have to mention i9 series. Not just numpy, PyTorch uses Magma, the SVD operation in Magma uses CPU too. Now this threadripper CPU becomes a huge bottleneck of our server.
I did some benchmark with python2 came with anaconda distribution. With the prebuild numpy (linked to rt_mkl), the performance is shockingly bad as I mentioned. With nomkl package, the openblas performance is slightly better with more thread, but still pretty bad. I have heard a lot of good things about threadripper, but maybe scientific computing is not what it is for.
Since this is only our experimental server with an AMD CPU, we can switch to Intel i9 or dual xeon platform without too much trouble. But please let me know if there is a way to make Threadripper get an okay performance with numpy. As long as it can get 80% of an i9-7900x performance on linear algebra, we will keep this CPU. If anyone can give me some advice, we will appreciate that!