Server Processors

jaj2276 · ‎10-26-2020

Currently running a production system with a variety of Intel processors. Our current "gold standard" system is a Dell R640 running dual Xeon Platinum 8168 2.7ghz chips.

We recently purchased a Dell R6525 running dual AMD 7642 processors.

We're currently running CentOS 7.8.2003 using the 3.10.0-1127 kernel. We've read as much literature as possible but can't seem to find any settings that allows the AMD-system to equal let alone surpass the performance of our Intel system.

Does anyone have any real-world experience with a similar system and what they needed to do to get it to perform at a level the benchmarks would suggest it could?

Anonymous · ‎10-26-2020

Hello jaj2276‌,

Each workload could require different tuning, and I don't know what your workload is to provide recommendations. Have you looked at the following website for our various tuning guides? https://developer.amd.com/resources/epyc-resources/epyc-tuning-guides/

And specifically, the workload tuning guide to find one that generally matches your workload. https://developer.amd.com/wp-content/resources/56745_0.80.pdf

jaj2276 · ‎10-27-2020

Hi mbaker_amd,

Thanks for the reply. I intentionally left out details because I didn't want to bore people with the details. A bit more information. Our application is a Java 8 app that doesn't do anything particularly aggressive other than we have a lot of threads in our process which while not ideal is what we have to work with.

We also have a 3rd party process that consumes lots of data over a 25G SolarFlare card using onload. Their process is written in c++ and consumes 15 cores (pinned). We pin another 4 threads to CPUs that act as clients to this process, two of which are executing Java code and the other two executing c++ code.

We have two sets of metrics that we look at. The first set is provided by the vendor and can look at various parts of their pipeline. What we see is that P50 is about 10% worse than our Intel machine while P99s can be up to 2x as bad.

Finally we have our metrics which attempt to show end-to-end transaction time in our Java code. Here we're seeing 10-20% worse times in both P50 and P99 measurements.

I've run various benchmarks and am getting conflicting results. The following tests show that our Intel machines outpacing our AMD machine.

perf bench sched/messaging

perf bench memcpy movsq/movsb

perf bench memset movsq/movsb

perf bench futex/requeue

perf bench futex/lock-pi

sysbench memory --memory-access-mode=rnd --threads=64

sysbench mutex -mutex-num-1 -threads=512

Other tests like perf bench numa and default/unrolled versions of memcpy/memset show AMD machine being much faster.

I installed phoronix test suite last night on each machine and ran the stress-ng set of tests and the only test that the Intel machine seemed to do appreciably better on was the forking test.

I will look at those guides you mentioned and see if we can't find the set of parameters that helps us make our AMD machine what we had hoped it would be when we bought it. Thanks again for the look.

Server Processors

Tuning CentOS/RedHat for EPYC 7642 processors