cancel
Showing results for 
Search instead for 
Did you mean: 

4th Gen AMD EPYC™ Processors Outshine the Latest 5th Gen Intel® Xeon® Processors

raghu_nambiar
4 0 6,312

The recent launch of the 5th Gen Intel® Xeon® processors provides a timely opportunity to evaluate how they measure up against the 4th Gen AMD EPYC™ processors across a variety of critical workloads. AMD’s unwavering commitment to data center performance, power efficiency, and cost-effectiveness continues to outshine the competition. AMD EPYC processors boast over 300 world records for performance, and Intel's latest release does not diminish our leadership position. This success is supported by a robust AMD EPYC ecosystem, featuring over 250 server designs and 800 cloud instances.

 

I am pleased to announce that both the general-purpose and high-frequency 4th Gen AMD EPYC processors extend their competitive edge to match the 5th Gen Intel Xeon Scalable processors in power efficiency and across a broad spectrum of workloads. These include foundational workloads, virtualized infrastructure, decision support systems, business applications, Computational Fluid Dynamics (CFD), Finite Element Analysis (FEA), molecular dynamics, quantum chemistry, and weather forecasting.

 

1. Power Efficiency with SPEC Power

Modern data centers are striving to meet rapidly growing demands while optimizing power usage to control costs and achieve sustainability goals. The SPECpower_ssj® 2008 benchmark from the Standard Performance Evaluation Corporation (SPEC®) provides a comparative measure of the energy efficiency of volume server class computers by evaluating both the power and performance characteristics of the System Under Test (SUT). The SPECpower® benchmark is the first industry-standard benchmark that evaluates the power and performance characteristics of single server and multi-node servers.

 

The SPECpower_ssj® 2008 benchmark’s defined measurement standard allows customers to compare energy efficiency across different configurations and servers. This benchmark is intended for use by hardware vendors, the IT industry, Original Equipment Manufacturers (OEMs), and governments. The SPECpower_ssj® 2008 metric, defined as “overall ssj_ops/watt,” indicates the power efficiency of the SUT. It is calculated as the ratio of the overall throughput ssj_ops, which is the sum of all ssj_ops scores for all target loads, to the sum of all power consumption averages in watts for all target loads.

 

4th Gen AMD EPYC processors continue to lead the way when it comes to power efficiency. A dual-socket system powered by 128-core AMD EPYC 9754 processors delivered ~2.25x the performance of a dual-socket system powered by Intel Xeon Platinum 8592+ processors.[1]

 

raghu_nambiar_16-1715193246792.png

Figure 1: Relative SPECpower_ssj® 2008 performance (click for larger image)

 

2. General Purpose Computing with SPEC CPU® 2017

The SPEC CPU® 2017 benchmark is one of the most popular industry standard benchmarks because its performance measurements are useful for comparing compute-intensive workloads by stressing the processor, memory subsystem, and compiler on different computer systems. SPEC CPU® 2017 contains 43 benchmarks organized into four suites, of which this blog focuses on two: SPECrate® 2017 Integer and SPECrate® 2017 Floating Point. Summary of the test results: 

 

  • 32 cores: A 2P 32-core AMD EPYC 9374F system has ~1.14x the performance of a 2P 32-core Intel Xeon Scalable 8562Y+ system on SPECrate® 2017_int_rate (base) and ~1.06x the performance on SPECrate® 2017_fp_rate (base) .[2][3]
  • 64 cores: A 2P 64-core AMD EPYC 9554 system has 1.19x the performance of a 2P 64-core Intel Xeon Scalable 8592+ system on SPECrate® 2017_int_rate and comparable performance on SPECrate® 2017 fp_rate.[4][5]
  • Top of stack: A 2P 96-core general purpose AMD EPYC 9654 system has ~1.60x the performance of a 2P 64-core Intel Xeon Scalable 8592+ system on SPECrate® 2017 int_rate and ~1.19x the performance on SPECrate® 2017 fp_rate.[6][7]

 

raghu_nambiar_17-1715193656711.png

Figure 2: SPECrate®2017_int_base performance uplifts (click for larger image)

 

raghu_nambiar_14-1715193213013.png

Figure 3: SPECrate® 2017_fp_base performance uplifts (click for larger image)

 

3. Server-side Java with SPECjbb® 2015

The SPECjbb® 2015 benchmark facilitates performance evaluations of server-side Java®-based applications by simulating a corporate environment with an IT setup that manages a blend of point-of-sale requests, online transactions, and data-mining tasks. The widespread adoption of Java makes this benchmark relevant to audiences that include Java Virtual Machine (JVM) vendors, hardware manufacturers, Java application developers, researchers, and academia. Summary of the test results: 

 

  • 64 cores: A dual-socket 64-core AMD EPYC 9554 system has comparable performance versus a dual-socket 64-core Intel Xeon Scalable 8592+ system on SPECjbb® 2015 MultiJVM-maxJOPS.[8]
  • Top of stack: A dual-socket 96-core general purpose AMD EPYC 9654 system has 1.48x the performance of a dual-socket 64-core Intel Xeon Scalable 8592+ system on SPECjbb® 2015 MultiJVM-maxJOPS.[9]

 

raghu_nambiar_12-1715193161070.png

Figure 4: SPECjbb® 2015 MultiJVM-maxJOPS performance uplifts (click for larger image)

 

4. Decision Support Systems with TPC Benchmark™H

The TPC Benchmark™ H (TPC-H) is a decision support benchmark that evaluates the performance of systems that address complex business inquiries by executing intricate queries across extensive datasets. The queries and data manipulations contained in this benchmark are relevant to diverse industries.

 

TPC-H presents results using the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), which gauges the system's efficiency in processing queries. This metric considers database sizes, computational capabilities for handling query streams, and query throughput when managing concurrent user requests. TPC-H also assesses a system’s price-performance ratio, which offers insights into the balance between system cost and performance or, more colloquially, the “bang for the buck.”

 

A 2P 64-core AMD EPYC 9554 system delivers ~1.14x price and ~1.22x price-performance TPC-H uplifts versus a 2P 64-core Intel Xeon Scalable 8592+ system at a scale factor of 10000 GB (SF 10000).[10]

 

raghu_nambiar_11-1715193140235.png

Figure 5: TPC-H performance and price-performance @ SF10000 (click for larger image)

 

5. Business Applications with SAP SD 

SAP Sales and Distribution (SAP SD) is a pivotal logistics module within SAP ERP (Enterprise Resource Planning) software. The SAP-SD 2-Tier benchmark serves to assess hardware performance by measuring database efficiency in SAP Application Performance Standard units (SAPS). SAPS, a hardware-independent measure, quantifies system performance within the SAP environment, derived from the Sales and Distribution (SD) benchmark. Most recently, a 2P system equipped with 96-core general-purpose AMD EPYC 9654 processors showed ~1.53x the performance of a 2P system powered by 64-core Intel Xeon Platinum 8592+ processors.[11] For further insights, please see my recent blog post 4th Gen AMD EPYC™ CPUs Empower Leadership SAP® Sales & Distribution (SAP SD) 2-Tier Performance.

 

Figure 6 demonstrates the performance improvements attained with "top of stack" processor SKUs throughout the four generations of general-purpose AMD EPYC processors. It also highlights the consistent outperformance of 2nd Gen AMD EPYC processors and newer generations compared to corresponding generations of Intel Xeon processors.

 

raghu_nambiar_10-1715193123815.png

Figure 6: SAP SD 2-Tier performance uplifts on 2P bare-metal systems (click for larger image)

 

6. Private Cloud Infrastructure with VMmark

The VMmark® 3 benchmark suite is designed to comprehensively assess the performance, power efficiency, and scalability of virtualized servers, in private cloud environments, under significant loads across a designated set of physical hardware. It not only evaluates individual server capabilities but also facilitates comparisons between different virtualization platforms, offering valuable insights into their relative strengths and weaknesses. Summary of the test results:

 

  • 64 cores: A 2P 64-core AMD EPYC 9554 system has ~1.29x the performance of a 2P 64-core Intel Xeon Scalable 8592+ system on VMmark 3.[12]
  • Top of stack: A 2P 96-core general purpose AMD EPYC 9654 system has ~1.60x the performance of a 2P 64-core Intel Xeon Scalable 8592+ system on VMmark 3.[13]


raghu_nambiar_9-1715193102136.png

Figure 7: VMmark 3 performance uplifts (click for larger image)

 

7. High Performance Computing (HPC)

HPC pervades nearly every facet of modern life by contributing to vital functions, such as forecasting major climate events and enhancing the safety of transportation and infrastructure. It drives affordability by optimizing material usage in products, streamlining designs, and trimming development costs. It also expedites time to market via rapid virtual prototyping that sidesteps the need for lengthy and costly physical testing.

 

The demand for increasingly robust HPC workload performance is on the rise. Enhanced performance translates to swifter simulations, enabling quicker product development cycles, broader scenario testing, and finer model refinements, ultimately leading to products that are more effective and efficient than their predecessors. Let’s explore how 4th Gen AMD EPYC processors drive leadership HPC performance.

 

7.1 Computational Fluid Dynamics

Computational Fluid Dynamics (CFD) employs numerical analysis to simulate and examine fluid behavior, such as how water flows around a boat hull or how air flows around a vehicle or aircraft. Its applications span diverse fields, encompassing industrial processing and consumer goods. CFD tasks are computationally demanding but are most often primarily constrained by memory bandwidth.

 

  • Altair® AcuSolve®: Altair® AcuSolve® is a proven asset for companies looking to explore designs by applying a full range of flow, heat transfer, turbulence, and non-Newtonian material analysis capabilities without the difficulties associated with traditional CFD applications. A 2P 32-core AMD EPYC 9374F system has a composite average of ~1.46x the performance of a 2P 32-core Intel Xeon Scalable 8562Y+ system running the acus-in test in AcuSolve.[14]

 

raghu_nambiar_8-1715193084480.png

Figure 8: Altair AcuSolve performance uplift (click for larger image)

 

  • Ansys® CFX®: Ansys® CFX® is a high-performance computational fluid dynamics (CFD) software tool that delivers robust, reliable, and accurate solutions quickly across a wide range of CFD and Multiphysics applications. A 32-core AMD EPYC 9374F system demonstrated ~1.48x the composite average performance of a 32-core Intel Xeon Scalable 8562Y+ system running select tests in CFX.[15]

 

raghu_nambiar_7-1715193068515.png

Figure 9: Ansys CFX performance uplift (click for larger image)

 

  • Ansys® Fluent®: Ansys® Fluent® is a fluid simulation application that offers advanced physics modeling capabilities and industry-leading accuracy. A 2P 32-core AMD EPYC 9374F system showed ~1.25x the composite average performance of a 2P 32-core Intel Xeon Scalable 8562Y+ system running select tests in Fluent.[16]

 

raghu_nambiar_6-1715193041079.png

Figure 10: Ansys Fluent performance uplift (click for larger image)

 

  • OpenFOAM®: OpenFOAM® is a free, open source CFD software. Its user base includes commercial and academic organizations. Running select tests within OpenFOAM, a 2P 64-core AMD EPYC 9554 system showed ~1.14x the composite average performance of a 2P 64-core Intel Xeon Scalable 8592+ system[17], and a top of stack general purpose 2P 96-core AMD EPYC 9654 had ~1.22x the composite average performance of  the same 64 core Intel Scalable 8592+ system.[18]

 

raghu_nambiar_5-1715193018964.png

Figure 11: OpenFOAM performance uplifts (click for larger image)

 

7.2 FEA Explicit

Explicit Finite Element Analysis (FEA) is a numerical simulation method that assesses the behavior of structures and materials under dynamic circumstances such as impacts, explosions, or crashes. The automotive industry uses FEA to predict vehicle behavior during collisions and to assess occupant safety. Cell phone manufacturers use FEA to simulate drop tests and ensure product durability. Employing simulations helps manufacturers cut costs and time by virtually testing designs rather than relying solely on physical prototypes.

These simulations involve complex digital models of the device under examination, such as a car or cell phone. These models simulate dynamic events like impacts by solving differential equations over time. The interactions between different parts of the model are scrutinized for potential deformations or failures. These calculations demand significant computational power and memory bandwidth. Moreover, the interconnected nature of the model makes communication between compute nodes crucial for exchanging information about how different parts of the model influence each other.

  • Ansys® LS-DYNA®: Ansys® LS-DYNA® is a widely used explicit simulation program. It is capable of simulating complex real-world short-duration events in the automotive, aerospace, construction, military, manufacturing, and bioengineering industries. A 2P 32-core AMD EPYC 9374F system demonstrated ~1.50x the composite average performance of a 2P 32-core Intel Xeon Scalable 8562Y+ system running select tests in LS-DYNA.[19]

 

raghu_nambiar_4-1715193001196.png

Figure 12: Ansys LS-DYNA performance uplift (click for larger image)

 

  • Altair® Radioss™: Altair® Radioss™ performs structural analyses under impact or crash conditions. Its benchmarks provide hardware performance data measured using sets of benchmark problems selected to represent typical usage. A 2P 32-core AMD EPYC 9374F system showed ~1.25x the composite average performance of  a 2P 32-core Intel Xeon Scalable 8562Y+ system running select tests in Radioss.[20]

 

raghu_nambiar_3-1715192981462.png

Figure 13: Altair Radioss performance uplift (click for larger image)

 

 

7.3 Molecular Dynamics

Molecular dynamics is a computational technique that analyzes the movement and behavior of atoms and molecules by solving Newton's equations of motion. It provides researchers with valuable insights into molecular systems by exploring various phenomena, such as material behavior, protein folding, and chemical reactions. GROMACS is a modular dynamics application that simulates Newtonian motion equations for systems with hundreds to millions of particles. 4th Gen AMD EPYC processors demonstrate superior performance compared to rival datacenter processors, with a 2P 64-core AMD EPYC 9554 system showed ~1.24x the composite average performance of a 2P 64-core Intel Xeon Scalable 8592+ system.[21] Additionally, a top-of-the-stack 2P 96-core AMD EPYC 9654 system has ~1.63x the composite average performance of the same Intel processor system running select tests in GROMACS.[22]

 

raghu_nambiar_2-1715192962428.png

Figure 14: GROMACS performance uplifts (click for larger image)

 

 

7.4 Quantum Chemistry

Quantum chemistry shedding light on bonding and reactions by exploring molecular properties like structure, energetics, and reactivity. CP2K is a quantum chemistry and solid-state physics tool that simulates various atomic-level systems. In the H2O-dft-ls test, a 2P 64-core AMD EPYC 9554 system has ~1.27x the performance of a 2P 64-core Intel Xeon Scalable 8592+ system, while a top-of-stack 2P 96-core AMD EPYC 9654 showed ~1.62x the performance of the same Intel processor system.[23]

 

raghu_nambiar_1-1715192939436.png

Figure 15: CP2K performance uplifts (click for larger image)

 

 

7.5 Weather Forecasting

Weather forecasting is integral to our daily lives. The Weather Research & Forecasting® (WRF®) developed and maintained by the National Center for Atmospheric Research (NCAR) boasts over 48,000 registered users across 160 countries. This flexible and computationally efficient platform enables operational forecasting across various scales, from meters to thousands of kilometers. In the conus2.5km test, a 2P 64-core AMD EPYC 9554 system showed ~1.30x the average performance of  a 2P 64-core Intel Xeon Scalable 8592+ system[25], while a top-of-the-line 2P 96-core AMD EPYC 9654 system had ~1.50x the average performance of the same Intel system[26].


raghu_nambiar_0-1715192830181.png

Figure 16: WRF performance uplifts (click for larger image)

 

8. Conclusion

The analysis contained in this blog demonstrated how 4th Gen AMD EPYC processors excel across key workloads by outperforming the latest 5th Gen Intel Xeon processors. The comparisons presented above highlight the robust performance and reliability of 4th Gen AMD EPYC processors across a diverse range of tasks critical to various industries. 4th Gen AMD EPYC processors consistently deliver exceptional results from powering virtualized environments to facilitating complex simulations in molecular dynamics and quantum chemistry. All major operating systems, hypervisors, and leading cloud service providers support AMD EPYC processors. Users can leverage the extensive resources available through the AMD Documentation Hub to achieve optimal workload performance, including BIOS and OS tunings for various workloads.

 

Endnotes

  1. SP5-011F: SPECpower_ssj® 2008 comparison based on published 2P server results as of 1/12/2024. Configurations: 2P 128-core AMD EPYC 9754 (36,210 overall ssj_ops/W, 2U, https://spec.org/power_ssj2008/results/res2024q1/power_ssj2008-20231205-01347.html) is 2.25x the performance of best published 2P 64-core Intel Xeon® Platinum 8592+ (16,106 overall ssj_ops/W, 2U, https://spec.org/power_ssj2008/results/res2024q1/power_ssj2008-20231205-01349.html). SPEC®, SPECpower®, and SPECpower_ssj® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
  2. SPECrate® 2017_int_base results @ 32 cores:
    - 2P EPYC 9374F, Score 827, https://www.spec.org/cpu2017/results/res2023q1/cpu2017-20230313-34768.html
    - 2P Xeon 8562Y, Score 727, https://www.spec.org/cpu2017/results/res2024q2/cpu2017-20240408-42752.html
  3. SPECrate® 2017_fp_base results @ 32 cores:
    - 2P EPYC 9374F, Score 964, https://www.spec.org/cpu2017/results/res2023q1/cpu2017-20230313-34766.html
    - 2P Xeon 8562Y+, Score 908, https://www.spec.org/cpu2017/results/res2024q2/cpu2017-20240408-42750.html
  4. SPECrate® 2017_int_base results @ 64 cores:
    - 2P EPYC 9554, Score 1340, https://www.spec.org/cpu2017/results/res2023q2/cpu2017-20230327-34992.html
    - 2P Xeon 8592+, Score 1130, https://www.spec.org/cpu2017/results/res2023q4/cpu2017-20231127-40064.html
  5. SPECrate@2017_fp_base results @ 64 cores:
    - 2P EPYC 9554, Score 1250, https://www.spec.org/cpu2017/results/res2024q1/cpu2017-20240129-40783.html
    - 2P Xeon 8592+, Score 1240, https://www.spec.org/cpu2017/results/res2023q4/cpu2017-20231127-40063.html
  6. SPECrate® 2017_int_base results @ top-of-stack:
    - 2P EPYC 9654, Score 1810, https://www.spec.org/cpu2017/results/res2024q1/cpu2017-20240129-40896.html
    - 2P Xeon 8592+, Score 1130, https://www.spec.org/cpu2017/results/res2023q4/cpu2017-20231127-40064.html
  7. SPECrate® 2017_fp_base results @ top-of-stack:
    - 2P EPYC 9654, Score 1480,  https://www.spec.org/cpu2017/results/res2024q1/cpu2017-20240111-40517.html
    - 2P Xeon 8592+, Score 1240, https://www.spec.org/cpu2017/results/res2023q4/cpu2017-20231127-40063.html
  8. SPECjbb® 2015 MultiJVM-maxJOPS results @ 64 cores:
    - 2P EPYC 9554, Score 559214 (266053 MultiJVM critical-jOPS), https://spec.org/jbb2015/results/res2022q4/jbb2015-20221005-00853.html
    - 2P Xeon 8592+, Score 558626 (297519 MultiJVM critical-jOPS), https://spec.org/jbb2015/results/res2024q1/jbb2015-20240110-01213.html
  9. SPECjbb® 2015 MultiJVM-maxJOPS results @ top-of-stack:
    - 2P EPYC 9654, Score 828952 (365847 MultiJVM critical-jOPS),  https://spec.org/jbb2015/results/res2023q2/jbb2015-20230419-01034.html
    - 2P Xeon 8592+, Score 558626 (297519 MultiJVM critical-jOPS), https://spec.org/jbb2015/results/res2024q1/jbb2015-20240110-01213.html
  10. TPC-H performance and price-performance, 64 cores @ SF10000:
    - https://www.tpc.org/3391 (2P EPYC 9554 system) 2,720,098 QphH@10000GB, 489.82 USD per kQphH@10000GB.
    - https://www.tpc.org/3389  (2P Xeon 8592+  system) 2,391,511 QphH@10000GB, 625.77 USD per kQphH@10000GB
  11. SAP SD 2-Tier performance uplifts on 2P bare-metal systems:
    - Intel® Xeon® Platinum 8180 SAP SD 2-Tier Benchmark Users 32,086: https://www.sap.com/dmc/benchmark/2018/Cert18060.pdf
    - AMD EPYC™ 7601 SAP SD 2-Tier Benchmark Users 28,000: https://www.sap.com/dmc/benchmark/2018/Cert18014.pdf
    - Intel® Xeon® Platinum 8280 SAP SD 2-Tier Benchmark Users 35,505: https://www.sap.com/dmc/benchmark/2019/Cert19026.pdf 
    - AMD EPYC™ 7742 SAP SD 2-Tier Benchmark Users 62,500: https://www.sap.com/dmc/benchmark/2019/Cert19047.pdf
    - Intel® Xeon® Platinum 8380 SAP SD 2-Tier Benchmark Users 48,000: https://www.sap.com/dmc/benchmark/2023/Cert23019.pdf
    - AMD EPYC™ 7763 SAP SD 2-Tier Benchmark Users 75,000: https://www.sap.com/dmc/benchmark/2021/Cert21021.pdf
    - Intel® Xeon® Platinum 8480+ SAP SD 2-Tier Benchmark Users 78,387: https://www.sap.com/dmc/benchmark/2023/Cert23027.pdf
    - Intel® Xeon® Platinum 8592+ SAP SD 2-Tier Benchmark Users 96,740: https://www.sap.com/dmc/benchmark/2023/Cert23077.pdf
    - AMD EPYC™ 9654 SAP SD 2-Tier Benchmark Users 148,000: https://www.sap.com/dmc/benchmark/2022/Cert22029.pdf
  12.  VMmark 3 performance @ 64 cores:
    - 2P EPYC 9554, 32.79 @ 32 tiles, https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vmmark/2023-11-14-Supermicro-AS-21...
    - 2P Xeon 8592+, , https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vmmark/2024-02-20-Dell-PowerEdge-R...
  13. VMmark 3 performance @ top-of-stack:
    - 2P EPYC 9654, 40.66 @ 42 tiles, https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vmmark/2023-06-13-Lenovo-ThinkSyst...
    - 2P Xeon 8592+,  https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vmmark/2024-02-20-Dell-PowerEdge-R...
  14. SP5-237: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9374F 32-Core Processor, and the INTEL® XEON® PLATINUM 8562Y+ running the following test on Altair® AcuSolve® 2022.2: * acus-in: ~1.46x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 32-Core AMD EPYC™ 9374F ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 32-Core INTEL® XEON® PLATINUM 8562Y+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  15. SP5-245: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9374F 32-Core Processor, and the INTEL® XEON® PLATINUM 8562Y+ running select tests on Ansys® CFX® V231. Uplifts for the performance metric normalized to the INTEL® XEON® PLATINUM 8562Y+ follow for each benchmark: * Airfoil 100: ~1.54x, * Airfoil 10: ~1.53x, * Airfoil 50: ~1.54x, * LeMans Car: ~1.40x, * Automotive Pump: ~1.37x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 32-Core AMD EPYC™ 9374F ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 32-Core INTEL® XEON® PLATINUM 8562Y+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  16. SP5-244: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Core Solver Rating) of this benchmark for the AMD EPYC™ 9374F 32-Core Processor, and the INTEL® XEON® PLATINUM 8562Y+ running select tests on Ansys® Fluent®: * aircraft_14m: ~1.33x, * aircraft_2m: ~1.26x, * combustor_12m: ~1.22x, * combustor_71m: ~1.27x, * exhaust_system_33m: ~1.22x, * f1_racecar-140m: ~1.29x, * fluidized_bed_2m: ~1.10x, * Fluent®-ice2: ~1.19x, * landing_gear_15m: ~1.23x, * LeMans_6000_16m: ~1.21x, * oil_rig_7m: ~1.03x, * f1_racecar_280m: ~1.24x, * pump_2m: ~1.43x, * rotor_3m: ~1.34x, * sedan_4m: ~1.39x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 32-Core AMD EPYC™ 9374F ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 32-Core INTEL® XEON® PLATINUM 8562Y+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  17. SP5-247: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9554 64-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running select tests on Open-Source OpenFOAM® 2212: * ofoam-1004040: ~1.15x, * ofoam-1084646: ~1.12x, * ofoam-1305252: ~1.14x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 64-Core AMD EPYC™ 9554 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  18. SP5-248: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9654 96-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running select tests on Open-Source OpenFOAM® 2212. Uplifts for the performance metric normalized to the INTEL® XEON® PLATINUM 8592+ follow for each benchmark: * ofoam-1004040: ~1.27x, * ofoam-1084646: ~1.20x, * ofoam-1305252: ~1.18x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 96-Core AMD EPYC™ 9654 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  19. SP5-246: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9374F 32-Core Processor, and the INTEL® XEON® PLATINUM 8562Y+ running select tests on Ansys® LS-DYNA® R13_1_1: * ls-3cars: ~1.61x, * ls-car2car: ~1.48x, * ls-neon: ~1.67x, * ls-odb10m-short: ~1.25x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 32-Core AMD EPYC™ 9374F ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 32-Core INTEL® XEON® PLATINUM 8562Y+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  20. SP5-249: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9374F 32-Core Processor, and the INTEL® XEON® PLATINUM 8562Y+ running select tests on Altair® RADIOSS™ 2022.2. * Dropsander: ~1.28x, * Neon: ~1.34x, * Taurus FFB50 (T10M): ~1.17x, * Venza Battery: ~1.21x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 32-Core AMD EPYC™ 9374F ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 32-Core INTEL® XEON® PLATINUM 8562Y+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  21. SP5-242: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for the AMD EPYC™ 9554 64-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running select tests on Open-Source GROMACS. * benchPEP: ~1.30x, * gmx_water1536K_PME: ~1.17x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 64-Core AMD EPYC™ 9554 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  22. SP5-243: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for the AMD EPYC™ 9654 96-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running select tests on Open-Source GROMACS. * benchPEP: ~1.70xm, * gmx_water1536K_PME: ~1.56x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 96-Core AMD EPYC™ 9654 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings. AMD testing as of 04/23/2024.
  23. SP5-238: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9554 64-Core Processor, and the INTEL® XEON® PLATINUM 8592+  running the following test on Open-Source cp2k. * H2O-dft-ls: ~1.27x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 64-Core AMD EPYC™ 9554 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  24. SP5-239: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for the AMD EPYC™ 9654 96-Core Processor, and the INTEL® XEON® PLATINUM 8592+  running the following test on Open-Source cp2k. Uplifts for the performance metric normalized to the INTEL® XEON® PLATINUM 8592+ follow for each benchmark: * H2O-dft-ls: ~1.62x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 96-Core AMD EPYC™ 9654 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings. AMD testing as of 04/23/2024.
  25. SP5-240: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Mean Time/Step) of this benchmark for the AMD EPYC™ 9554 64-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running the following test on Open-Source WRF® 4.2.1. * conus2.5km: ~1.30x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 64-Core AMD EPYC™ 9554 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
  26. SP5-241: AMD testing as of 04/23/2024. The detailed results show the average uplift of the performance metric (Mean Time/Step) of this benchmark for the AMD EPYC™ 9654 96-Core Processor, and the INTEL® XEON® PLATINUM 8592+ running the following test on Open-Source WRF® 4.2.1: * conus2.5km: ~1.50x. AMD System Configuration: Server: AMD Titanite; Processors: 2 x 96-Core AMD EPYC™ 9654 ; Memory: 24x 64GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; BIOS: RTI1009C; BIOS Settings from Default; SMT=Off, NPS=4, Determinism=Power; OS: RHEL 9.3; Kernel: Linux 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: amd_iommu=on, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance, Disable C2 States. Intel System Configuration: Server: Lenovo Thinksystem SR650 V3; Processors: 2 x 64-Core INTEL® XEON® PLATINUM 8592+; Memory: 16x 64GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; BIOS: ESE122V-3.10; BIOS Settings from Default: Hyperthreading=Off, Profile=Maximum Performance Profile; OS: RHEL 9.3; Kernel: 5.14.0-362.8.1.el9_3.x86_64; Kernel CMDLINE: processor.max_cstate=1, Intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Tunings: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance. Results may vary based on system configurations, software versions, and BIOS settings.
About the Author
Raghu Nambiar currently holds the position of Corporate Vice President at AMD, where he leads a global engineering team dedicated to shaping the software and solutions strategy for the company's datacenter business. Before joining AMD, Raghu served as the Chief Technology Officer at Cisco UCS, instrumental in driving its transformation into a leading datacenter compute platform. During his tenure at Hewlett Packard, Raghu made significant contributions as an architect, pioneering several groundbreaking solutions. He is the holder of ten patents, with several more pending approval, and has made extensive academic contributions, including publishing over 75 peer-reviewed papers and 20 books in the LNCS series. Additionally, Raghu has taken on leadership roles in various industry standards committees. Raghu holds dual Master's degrees from the University of Massachusetts and Goa University, complemented by completing an advanced management program at Stanford University.