Threadripper parallelization slow: cores not maxed out
I built a PC with an AMD Threadripper 1950X, which has 16 cores. The system is running well and is very responsive. I use the PC mostly to program in the F# language. It is very easy to write parallelized code in F# (as in any functional language) and that is essential for the application I am developing.
Asus Prime X399-A Motherboard
32GB of DDR4-3200 memory
Samsung 960 M.2 NVMe SSD
System overclocked to 4.00 GHz.
I was expecting the performance of parallelized code would be much faster than under my old system with an Intel i7 4770K (Haswell processor), which has only four cores. Unfortunately that is not the case. The two systems execute parallelized code at approximately the same speed.
Actually the Threadripper is much faster than the Haswell when running some simple parallelized F# programs I wrote for testing purposes. But for the complex application I am developing the two are roughly equivalent. There is something about a "heavy" application that the Threadripper does not seem to handle that well.
I found out that if I disable SMT and work with only 16(as opposed to 32) logical cores the Threadripper does better. Without this it would actually be slower than the Haswell.
The cause of the problem seems to be that the Threadripper uses all cores but does not push them to the extreme. Each core runs at about 40-50% and the overall CPU usage also remains in that range.
I used a program called Process Lasso to get this information.
Does anyone have a suggestion on how to maximize each core's usage when running parallelized code?