9 Replies Latest reply on Nov 26, 2018 2:47 PM by misterj

    1950x multi-threaded performance

    gwiesenekker

      Hi,

       

      I have a performance problem with a multi-threaded (CPU and memory intensive, not I/O intensive) program on a 1950X (ASUS PRIME X399-A motherboard,

      Corsair Vengeance LPX DDR4 4x16GB@3000 MHz memory): the performance drops by 50% when going from one to four threads. After having excluded semaphore locks and such as the cause of the problem I decided to run the same program on an Intel i7-7820HQ (Dell Precision 7520 motherboard, DDR4 4x16GB@2400MHz memory) in which case the performance drops by only 10%. OS is ubuntu 18.04, kernel version is 4.15.0-38-generic, GCC version is 7.3.0.

       

      Any ideas what could be causing this difference/how I can improve the multi-threaded performance on the 1950X?

       

      Thanks,

      Gijsbert

        • Re: 1950x multi-threaded performance
          misterj

          Gijsbert, this, at least might, be interesting: Level1 It does have some interesting tools.  Enjoy, John.

          • Re: 1950x multi-threaded performance
            gwiesenekker

            Hi,

             

            I have done some more research into this. I have used different profilers (gprof and my own high-resolution code-profiler), different compilers (gcc and aocc/clang), different algorithms (no hash tables, replaced semaphores by crc32 protected memcpy) but the results are the same: the self-time of all functions (also simple functions that do no call other functions and are not often invoked) slow down by an average factor of 0.6 when going from 1 to 2 threads and by an average factor of 0.4 when going from 1 to 4 threads. The only thing I notice at the operating system level is that when I execute 1/2/4 threads 'cat /proc/cpuinfo' shows 2/4/8 CPU's going from 2.1 to 3.7Ghz, whereas you perhaps expect 1/2/4 CPU's. 'htop' shows the expected 1/2/4 threads and the corresponding 100/200/400% CPU usage.

             

            Any suggestions?

             

            Regards,

            Gijsbert

             

            • Re: 1950x multi-threaded performance
              gwiesenekker

              Hi,

               

              I have now run a couple of sysbench (version 1.0.11) benchmarks, and the 'sysbench --test=memory --num-threads=N run' shows that 'MiB transferred/sec' decreases from 5564, 3024 to 2154 for 1/2/4 threads on my 1950x system, but increases from 5944, 7010 to 9272 for 1/2/4 threads on my i7-7820HQ system..

               

              How does this test scale on your 1950x system?

               

              Regards,

              Gijsbert

              • Re: 1950x multi-threaded performance
                gwiesenekker

                Hi,

                 

                A comment in the memory benchmark from SiSandra 'We finally discover an issue – TR (just like Ryzen) memory latencies (in-page, random access pattern) are huge – almost 3x higher than Intel’s.' allowed me to find the root-cause: you have to set the thread affinity on 1950x! My first attempt (associate thread 0 with CPU 0, thread 1 with CPU 1 etc.) already greatly improved the multi-threaded performance of my program.

                 

                Regards,

                Gijsbert

                  • Re: 1950x multi-threaded performance
                    gwiesenekker

                    FYI, here are 'sysbench --test=memory --num-threads=N run' results without and with setting the thread affinity. They speak for themselves:

                     

                    $ sysbench --threads=1 --test=memory run | grep -i mib/sec

                    62991.16 MiB transferred (6297.75 MiB/sec)

                    $ sysbench --threads=2 --test=memory run | grep -i mib/sec

                    31019.36 MiB transferred (3101.29 MiB/sec)

                    $ taskset 0x3 sysbench --threads=1 --test=memory run | grep -i mib/sec

                    61560.52 MiB transferred (6154.75 MiB/sec)

                    $ taskset 0x3 sysbench --threads=2 --test=memory run | grep -i mib/sec

                    102400.00 MiB transferred (10305.26 MiB/sec)

                     

                    Regards,

                    Gijsbert