cancel
Showing results for 
Search instead for 
Did you mean: 

Competitive Performance Claims: An Industry Standard Based Approach

raghu_nambiar
8 0 10.6K

AMD is committed to industry standard benchmarks – fully transparent, peer-reviewed or independently audited, verifiable, and full disclosure reports to reproduce our results. We are proud of our 300+ world records across a diverse set of important workload areas, relevant to our customers.

Recently Intel published a set of benchmarks comparing the performance on the Intel Xeon 8462Y+ CPU vs. the AMD EPYC™ 9354 CPU for several workloads. This publication raises several questions around the methodology used. Why was a newer version of the operating systems (almost always with better performance) used for the Xeon configuration as compared to the AMD EPYC™ configuration? Why was a high-performance Xeon processor, 8462Y+ used against a main-stream AMD EPYC™ 9354 processors? The AMD EPYC™ 9374F, which has consistently performed exceptionally well across many workloads, would have been the “comparable” processor. Lastly, why were non-industry standard benchmarks, which has only a limited number of verifiable industry benchmarks and no transparency, used? Without more transparency it is not clear whether the AMD EPYC™ system was tuned for best performance as recommended in the published AMD EPYC Tuning Guides.

Results published by industry partners for consortia-based standard benchmarks represent a consistent way of showcasing performance of computing systems from a variety of vendors. These benchmarks require a rigorous adherence to the benchmark kits, audit methods and review processes to ensure compliance and a consistent and fair manner of comparison across system types and configurations.

Mainstream Compute

Here are a few examples of broadly used standard, verifiable benchmarks used by the industry to assess real-world performance of mainstream use cases. It is important to understand that all of this testing was performed by our partners. The results speak for themselves:

  • SPECcpu® 2017 is the most popular benchmark for measuring processor performance. It consists of a suite of compute intensive micro-benchmarks selected by a committee of industry and academia. Table 1 shows AMD EPYC processors delivering undisputed performance leadership at both 32 cores and top of stack.[1]

 

32-core
(8462Y+ vs. 9374F)

Top of Stack
(8490H vs. 9654)

SPECrate®2017_int_base 

676 vs. 827 (1.22x faster)

1010 vs. 1800 (1.78x faster)

SPECrate®2017_fp_base 

782 vs. 964 (1.23x faster)

1020 vs. 1480 (1.45x faster)

Table 1: SPECcpu® 2017 performance comparisons

  • SPECjbb® 2015 is a popular yardstick that enables fair performance measurements of server-side Java based applications. SPECjbb® 2015 simulates a company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations. The rapid adoption of Java across the industry over the last two decades makes this benchmark relevant to all audiences, including Java Virtual Machine (JVM) vendors, hardware developers, Java application developers, researchers, and members of the academic community. Table 2 shows another example of undisputed AMD EPYC performance leadership at both 32 cores and top of stack.[2]

 

32-core
(8462Y+ vs. 9374F)

Top of Stack
(8490H vs. 9654)

SPECjbb2015 MultiJVM max-jOPS

279,312 vs. 359,294 (1.29x faster)

505,379 vs. 828,952 (1.64x faster)

Table 2: SPECjbb® 2015 performance comparisons

  • RDBMS: Let’s look at relational database benchmarks. SAP-SD is a popular benchmark designed to help customers find the appropriate hardware configuration for their IT solutions. A 2P system powered by 96-core AMD EPYC 9654 processors delivered 809,570 SAPS compared to a 2P system powered by Intel Xeon Platinum 8490H processor that delivered 428,730 SAPS - a performance uplift of ~1.88x at the system level [4]. TPC Benchmark™ E (TPC-E) is an industry standard for benchmarking transaction processing systems. A single socket server powered by a 96-core AMD EPYC 9654 processor outperformed a two socket server powered by two 60-core Intel Xeon Platinum 8490H processors both running Microsoft® SQL Server.[5]

  • Virtualization: VMware® VMmark3® is the industry leading enterprise virtualization consolidation benchmark that measures the performance and scalability of the VMware vSphere® hypervisor on a variety of hardware vendor platforms. AMD has dominated this space in recent years, establishing world record virtualization performance in the configurations that matter most to our customers: 2 Node 4 Total Socket SAN, 4 Node 8 Total Socket vSAN, and Overall Leadership. The current generation 96-core AMD EPYC 9654 delivered 40.51 @ 43 Tiles while a top of stack 60-core Intel Xeon Platinum 8490 delivered 23.38 @ 23 Tiles in a similar 2 node, 4 total socket configuration—a ~1.73x performance advantage. This performance leadership is not limited to 4th Gen AMD EPYC processors: the 3rd Gen AMD EPYC 7773X processor also outperforms the Intel Xeon Platinum 8490H.[6]

  • SPECpower_ssj® 2008: The SPECpower_ssj2008 benchmark suite measures the power and performance characteristics of systems. A two-processor AMD EPYC 9654 system has a power efficiency of 30,602 while a two-processor Intel Xeon 8490H system has a power efficiency of 16,902 when comparing overall ssj_ops/watt metric of SPECpower_ssj2008, based on published results at spec.org - a ~1.81x higher energy efficiency for the AMD EPYC based server. [3]

Artificial Intelligence

The Artificial Intelligence (AI) ecosystem continues to evolve. Benchmarks and workloads are in constant flux. Accelerators, such as Intel’s AMX, can aid some compute-bound portions of the workload. By contrast, the AMD strategy focuses on offering the highest performing general-purpose cores that deliver high performance across the widest range of workloads. Many AI workloads are memory bound (such as many Large Language Models, or LLMs) and therefore either do not benefit from AMX or see only limited speedups because of Amdahl’s Law.

AI cycles that become a dense portion of the application often get offloaded to accelerators, such as the AMD Instinct MI250 or AMD Alveo V70. Ongoing market evolution will drive any future decisions to add acceleration to our general-purpose devices. AMD leverages our strong IP and software portfolio, such as our 7040U CPUs in client. Inference usually comprises a small portion of the overall workflow; even a large inference speedup typically only delivers a small overall speedup.

A representative AI benchmark holistically measures performance across the overall workflow. TPC Express Benchmark AI (TPCx-AI) from the Transaction Processing Performance Council seeks to become an industry standard by measuring representative end-to-end data AI use cases in both datacenters and the cloud. This benchmark covers 10 real-world use cases across different scaling factors (dataset sizes): customer segmentation, customer conservation transcription, sales forecasting, spam detection, price prediction, hardware failure, product rating, classification of trips, facial recognition, and fraud detection. AMD is proud to have leadership performance and price-performance at scale factors 3, 10, 30, 100, 300, and 1000. Please visit tpc.org for the most up-to-date results. There are no published Intel Sapphire Rapids as of this blog’s publication date.

High Performance Computing

High Performance Computing (HPC) has been a priority for AMD since the launch of our 1st Gen AMD EPYC processors in 2017. Here again, AMD EPYC processors continue to deliver leadership performance from enterprises to national labs. AMD EPYC processors deliver performance leadership at both 32 cores and top of stack vs. the competition. AMD engineers diligently ensure that all platforms are represented in the best possible light by properly tuning all AMD and competing systems for maximum performance, such as comparable hardware setups and BIOS settings, and using the same operating system and options in all systems. Figures 1 and 2 show relative performance numbers as a composite average of benchmarks for each application tested. [5]

raghu_nambiar_3-1686668750626.png

Figure 1: 32-Core Performance Comparison

raghu_nambiar_0-1686669824760.png

Figure 2: Top of Stack Performance Comparison

Stay tuned for ongoing performance updates starting with the AMD Accelerated Data Center Premiere tomorrow and continuing into the rest of 2023!

Conclusion

AMD relies on independent testing that is performed, audited, and published by our ecosystem partners and our own internal testing. We also perform our own internal testing, investing our time and resources to characterize the systems, understand the nuances, and tune the systems – both our own and the competition’s - for maximum performance. At AMD, our customers are important to us and therefore it is important to us to showcase how our processors perform for relevant workloads. Understanding top-of-stack vs. top-of-stack performance for scale-up workloads where system-level performance is critical and, a similar performance comparison with cloud deployments where virtual machine density is important are meaning to our customers. Please visit AMD EPYC™ Technical Whitepapers and Briefs to see our extensive, growing library of Performance Briefs and other AMD and third-party content that speaks directly to the ongoing performance leadership delivered by AMD EPYC™ processors. Please also see our library of AMD EPYC™ Server Performance Tuning Guides for instructions on how to get the most out of your AMD EPYC™ processors for a growing variety of workloads.

Raghu Nambiar is a Corporate Vice President of Data Center Ecosystems and Solutions for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

References

  1. SPECcpu® 2017 results as of June 12, 2023. Integer Rate Results. AMD:
    https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230410-35820.html,https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230424-36017.html, Intel: https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230523-36893.html,https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230522-36594.html; Floating Point Rate Results. AMD: https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230410-35818.html,https://www.spec.org/cpu2017/results/res2022q4/cpu2017-20221024-32605.html, Intel: https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230523-36905.html,https://www.spec.org/cpu2017/results/res2023q1/cpu2017-20230310-34571.html
  2. SPECjbb® 2015 results as of June 12, 2023. AMD: https://www.spec.org/jbb2015/results/res2023q1/jbb2015-20230308-01023.htmlhttps://www.spec.org/jbb2015/results/res2023q2/jbb2015-20230419-01034.html. Intel: https://www.spec.org/jbb2015/results/res2023q1/jbb2015-20230308-01026.html,https://www.spec.org/jbb2015/results/res2023q1/jbb2015-20230119-01006.html
  3. SPECpower_ssj® 2008 results as of June 12, 2023. AMD: https://www.spec.org/power_ssj2008/results/res2022q4/power_ssj2008-20221204-01204.html, Intel: https://www.spec.org/power_ssj2008/results/res2023q2/power_ssj2008-20230507-01251.html
  4. SAP-SD Benchmark scores as of June 12, 2023. 2 x Intel Xeon Platinum 8490H Processor (1.90 GHz, 120 cores, 240 threads) SAP ASE 16. SAPS 428,730 https://www.sap.com/dmc/benchmark/2023/Cert23021.pdf. 2 x AMD EPYC 9654 processor (2.4 GHz, 192 cores, 384 threads) SAP ASE 16. SAPS 809,570 https://www.sap.com/dmc/benchmark/2022/Cert22029.pdf, 809570/428730=1.8882=88.2%
  5. TPC Benchmark E results as of June 12, 2023. Result ID: 123052301. 13,000.00 tpsE at $74.09 per tpsE available on May 24, 2023. Result ID: 123031001. 12,436.66 tpsE at $95.46 per tpsE available on May 18, 2023. https://www.tpc.org/tpce/results/tpce_perf_results5.asp
  6. VMMark 3 results as of as of June 12, 2023. AMD 9654 2 Node 2 Socket: 40.51 @ 43 Tiles , AMD 7773X 2 Node 2 Socket: 23.64 @ 24 Tiles. Intel 8490H 2 Node 2 Socket: 23.38 @ 23 Tiles 
  7. HPC testing was performed on the following systems and obtained the following results:
  • System configurations:
    • 32-core AMD: CPU: 2 x AMD EPYC 9374F; Frequencies (base|boost): 3.85 GHz | 4.10 GHz; Cores: 32 cores/socket (64 threads); L3: 256 MB per CPU; Memory: 1.5 TB (24x) Dual-Rank DDR5 4800 64 GB DIMMs 1 DPC; NIC: 25 Gb Ethernet CCX512-A ConnectX-5 (fw 16.35.2000); InfiniBand: 200 Gb HDR ConnectX-6 VPI (fw 20.35.2000); Storage: Samsung MZQL21T9HCJR-00A07 1.92 TB; BIOS: 1007D; BIOS options: SMT=OFF, NPS=4, Determinism=Power, OS: RHEL 8.7 (kernel 4.18.0-425.3.1.el8.x86_64); OS options: amd_iommu=ON, iommu=pt, mitigations=off, clear caches, NUMA balancing=0, THP=on, CPU governor=Performance, C2 states=disabled
    • Top of stack AMD: CPU: 2 x AMD EPYC 9684X; Frequencies (base|boost): 2.55 GHz | 3.70 GHz (up to); Cores: 96 cores/socket (192 threads); L3: 1152 MB per CPU; Memory: 1.5 TB (24x) Dual-Rank DDR5 4800 64 GB DIMMs 1 DPC; NIC: 25 Gb Ethernet CCX512-A ConnectX-5 (fw 16.35.2000); InfiniBand: 200 Gb HDR ConnectX-6 VPI (fw 20.35.2000); Storage: Samsung MZQL21T9HCJR-00A07 1.92 TB; BIOS: 1007D; BIOS options: SMT=OFF; NPS=4; Determinism=Power; OS: RHEL 8.7 (kernel 4.18.0-425.3.1.el8.x86_64); OS options: amd_iommu=ON, iommu=pt, mitigations=off, clear caches, NUMA balancing=0, THP=on, CPU governor=Performance, C2 states=disabled
    • 32-core Intel®: CPU: 2x Intel® Xeon® Platinum 8462Y+; Frequencies (base|boost): 2.40 GHz | 4.10 GHz (up to); Cores: 32 cores per socket (64 threads); L3: 60 MB per CPU; Memory: 1.0 TB (16x) Dual-Rank DDR5 4800 64 GB DIMMs 2 DPC; NIC: 25 Gb Ethernet CCX512-A ConnectX-5 (fw 16.35.2000); InfiniBand: 200 Gb HDR ConnectX-6 VPI (fw 20.35.2000); Storage: Samsung MZQL21T9HCJR-00A07 1.92 TB; BIOS: ESE110Q-1.10; BIOS options: Hyperthreading=Off, Profile = Maximum Performance; OS: RHEL 8.7 (kernel 4.18.0-425.3.1.el8.x86_64); OS options: processor.max_cstate=1; Intel®_idle.max_cstate=0; iommu=pt mitigations=off; clear caches; NUMA Balancing=0; randomize_va_space 0; THP=ON; CPU Governor=Performance
    • Top of stack Intel®: CPU: 2x Intel® Xeon® Platinum 8480+; Frequencies (base|boost): 1.90 GHz | 3.50 GHz (up to); Cores: 60 cores per socket (120 threads); L3: 112.5 MB per CPU; Memory: 1.0 TB (16x) Dual-Rank DDR5 4800 64 GB DIMMs 2 DPC; NIC: 25 Gb Ethernet CCX512-A ConnectX-5 (fw 16.35.2000); InfiniBand: 200 Gb HDR ConnectX-6 VPI (fw 20.35.2000); Storage: Samsung MZQL21T9HCJR-00A07 1.92 TB; BIOS: ESE110Q-1.10; BIOS options: Hyperthreading=Off, Profile = Maximum Performance; OS: RHEL 8.7 (kernel 4.18.0-425.3.1.el8.x86_64); OS options: processor.max_cstate=1; Intel®_idle.max_cstate=0; iommu=pt mitigations=off; clear caches; NUMA Balancing=0; randomize_va_space 0; THP=ON; CPU Governor=Performance
  • 32-core performance results: All of the following results may vary due to factors such as OS and BIOS versions and settings, use of production servers, and other variables.
    • ANSYS® LS-DYNA®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core AMD EPYC 9374F delivers ~1.41x the performance vs. a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • Altair® Radioss™: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.28x the performance vs. a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • Altair® AcuSolve®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.50x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • Ansys® CFX®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.56x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • Ansys® Fluent®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.28x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • OpenFOAM®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.48x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • GROMACS: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.08x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • SLB ECLIPSE®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.29x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • SLB INTERSECT®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.36x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • Shearwater® Reveal®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.10x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
    • WRF®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 32-Core EPYC™ 9374F delivers ~1.42x the performance compared to a system powered by 2P Intel® Xeon® Platinum 8462Y+ processors.
  • Top of stack: All of the following results may vary due to factors such as OS and BIOS versions and settings, use of production servers, and other variables.
    • ANSYS® LS-DYNA®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 96-Core EPYC™ 9654 delivers ~1.98x the performance compared to a system powered by 2P 60-core Intel® Xeon® Platinum 8490H processors.
    • ANSYS® CFX®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 96-Core EPYC™ 9654 delivers ~1.69x the performance compared to a system powered by 2P 60-core Intel® Xeon® Platinum 8490H processors.
    • ANSYS® Fluent®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 96-Core EPYC™ 9654 delivers ~1.82x the performance compared to a system powered by 2P 60-core Intel® Xeon® Platinum 8490H processors.
    • OpenFOAM®: Based on AMD internal testing as of 5/15/2023 on a system powered by 2P 96-Core EPYC™ 9654 delivers ~1.52x the performance compared to a system powered by 2P 60-core Intel® Xeon® Platinum 8490H processors.
About the Author
Raghu Nambiar currently holds the position of Corporate Vice President at AMD, where he leads a global engineering team dedicated to shaping the software and solutions strategy for the company's datacenter business. Before joining AMD, Raghu served as the Chief Technology Officer at Cisco UCS, instrumental in driving its transformation into a leading datacenter compute platform. During his tenure at Hewlett Packard, Raghu made significant contributions as an architect, pioneering several groundbreaking solutions. He is the holder of ten patents, with several more pending approval, and has made extensive academic contributions, including publishing over 75 peer-reviewed papers and 20 books in the LNCS series. Additionally, Raghu has taken on leadership roles in various industry standards committees. Raghu holds dual Master's degrees from the University of Massachusetts and Goa University, complemented by completing an advanced management program at Stanford University.