My question is in regards to the performance for a 7742 server and, in particular, to the HPCG benchmarks. Firstly, I'm following the "AMD - High Performance Computing: Tuning Guide for AMD EPYC 7002 Series Processors." The thing is that increasing the number of MPI ranks doesn't scale up too well in the high range. In order to compare my results to someone else's, I found this post (https://www.pugetsystems.com/labs/hpc/HPC-Parallel-Performance-for-3rd-gen-Threadripper-Xeon-3265W-and-EPYC-7742-HPL-HPCG-Numpy-NAMD-1717/) and the 6th figure (HPCG scaling) shows very similar results to mine. As a matter of fact, a 7551 results in higher HPCG results, which was not the expectation. My background is definitely not in microprocessor design so I might be missing something, but my only explanation is that the shared L3 cache is limiting memory access (as the HPCG benchmark solves a sparse matrix, not a dense one). Not knowing how many people in the AMD Server Gurus community are familiar with this benchmark, I was wondering if anyone has any comment or suggestion on system set-up or else that could potentially increase the benchmark outcome.
Thanks in advance.