cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

gwiesenekker
Journeyman III

1950x multi-threaded performance

Hi,

I have a performance problem with a multi-threaded (CPU and memory intensive, not I/O intensive) program on a 1950X (ASUS PRIME X399-A motherboard,

Corsair Vengeance LPX DDR4 4x16GB@3000 MHz memory): the performance drops by 50% when going from one to four threads. After having excluded semaphore locks and such as the cause of the problem I decided to run the same program on an Intel i7-7820HQ (Dell Precision 7520 motherboard, DDR4 4x16GB@2400MHz memory) in which case the performance drops by only 10%. OS is ubuntu 18.04, kernel version is 4.15.0-38-generic, GCC version is 7.3.0.

Any ideas what could be causing this difference/how I can improve the multi-threaded performance on the 1950X?

Thanks,

Gijsbert

0 Likes
9 Replies

What program?

0 Likes

A C program developed by me that plays international draughts. The alpha-beta search (CPU intensive and memory intensive due to the hash tables) is multi-threaded: threads publish work (nodes to be searched) to which other threads can subscribe.

Gijsbert

0 Likes
misterj
Big Boss

Gijsbert, this, at least might, be interesting: Level1​ It does have some interesting tools.  Enjoy, John.

0 Likes
gwiesenekker
Journeyman III

Hi,

I have done some more research into this. I have used different profilers (gprof and my own high-resolution code-profiler), different compilers (gcc and aocc/clang), different algorithms (no hash tables, replaced semaphores by crc32 protected memcpy) but the results are the same: the self-time of all functions (also simple functions that do no call other functions and are not often invoked) slow down by an average factor of 0.6 when going from 1 to 2 threads and by an average factor of 0.4 when going from 1 to 4 threads. The only thing I notice at the operating system level is that when I execute 1/2/4 threads 'cat /proc/cpuinfo' shows 2/4/8 CPU's going from 2.1 to 3.7Ghz, whereas you perhaps expect 1/2/4 CPU's. 'htop' shows the expected 1/2/4 threads and the corresponding 100/200/400% CPU usage.

Any suggestions?

Regards,

Gijsbert

0 Likes
gwiesenekker
Journeyman III

Hi,

I have now run a couple of sysbench (version 1.0.11) benchmarks, and the 'sysbench --test=memory --num-threads=N run' shows that 'MiB transferred/sec' decreases from 5564, 3024 to 2154 for 1/2/4 threads on my 1950x system, but increases from 5944, 7010 to 9272 for 1/2/4 threads on my i7-7820HQ system..

How does this test scale on your 1950x system?

Regards,

Gijsbert

0 Likes

gwiesenekker, is all your testing on Linux?  Can you suggest a similar W10 test?  That upside down results with your i7 above may interest AMD.  Enjoy, John.

0 Likes
gwiesenekker
Journeyman III

Hi,

A comment in the memory benchmark from SiSandra 'We finally discover an issue – TR (just like Ryzen) memory latencies (in-page, random access pattern) are huge – almost 3x higher than Intel’s.' allowed me to find the root-cause: you have to set the thread affinity on 1950x! My first attempt (associate thread 0 with CPU 0, thread 1 with CPU 1 etc.) already greatly improved the multi-threaded performance of my program.

Regards,

Gijsbert

0 Likes

FYI, here are 'sysbench --test=memory --num-threads=N run' results without and with setting the thread affinity. They speak for themselves:

$ sysbench --threads=1 --test=memory run | grep -i mib/sec

62991.16 MiB transferred (6297.75 MiB/sec)

$ sysbench --threads=2 --test=memory run | grep -i mib/sec

31019.36 MiB transferred (3101.29 MiB/sec)

$ taskset 0x3 sysbench --threads=1 --test=memory run | grep -i mib/sec

61560.52 MiB transferred (6154.75 MiB/sec)

$ taskset 0x3 sysbench --threads=2 --test=memory run | grep -i mib/sec

102400.00 MiB transferred (10305.26 MiB/sec)

Regards,

Gijsbert

0 Likes

Thanks much, gwiesenekker.  I will ask again: do you know of a Windows test that will expose this?  Thanks and enjoy, John.

0 Likes