cancel
Showing results for 
Search instead for 
Did you mean: 

5th Gen AMD EPYC™ Processors Elevate HPC and AI Workloads to New Heights

The momentum surrounding the 5th generation of AMD EPYC™ processors has exceeded all expectations. My previous blog explored enterprise and cloud workloads. Now, as promised, I’m excited to follow up and explore how these processors are transforming HPC and AI. Let’s take a closer look at how 5th Gen AMD EPYC processors accelerate performance in these critical areas.

Commercial High-Performance Computing (HPC)

HPC touches our lives in many ways, from weather forecasting to manufacturing and life sciences. These applications require the absolute best performance available. HPC has been a major focus for AMD since 1st Gen AMD EPYC processors launched in 2017, with our customers consistently valuing the performance and efficiency these processors provide.

The ongoing AMD commitment to innovation delivers exceptional performance across multiple technical computing sectors, including enterprises and national laboratories. In this blog, I will highlight the generational and competitive performance improvements for HPC workloads by comparing 5th Gen AMD EPYC processors versus both 4th Gen AMD EPYC and 5th Gen Intel® Xeon® Platinum processors.

AMD EPYC 9005 Series Processors offer from 8 to 192 cores and TDPs ranging from 155 to 500W. I’ve selected the 5th Gen 64-core high frequency AMD EPYC 9575F for commercial HPC applications because this processor leads the industry by breaking the 5 GHz max boost frequency barrier.[1] I’m referencing a lower core-count processor for commercial applications since many are licensed on a per-core basis, making performance/core a key metric for system purchases and overall TCO.  By contrast, I’ve opted for CPUs with higher core counts to show overall throughput performance for open source HPC applications that do not have perf/core license concerns.

Computational Fluid Dynamics

Altair® AcuSolve® is a powerful tool for companies aiming to explore designs through comprehensive analysis of flow, heat transfer, turbulence, and non-Newtonian materials, all without the complexities of traditional CFD applications.

The 5th Gen AMD EPYC 9575F processor offers ~1.62x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9554 processor surpasses it by ~1.23x, as illustrated in Figure 1.[2][3]. Here is a great example of how our higher-frequency options are an excellent segment-oriented optimization that enables customers to get higher value from core-based license model applications.

raghu_nambiar_0-1728919625276.png

 Figure 1: Altair AcuSolve

Ansys® CFX® is a high-performance computational fluid dynamics (CFD) software that provides robust, reliable, and accurate solutions swiftly across a broad spectrum of CFD and multi-physics applications.

The 5th Gen AMD EPYC 9575F processor offers ~1.54x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9554 processor surpasses it by ~1.20x, as illustrated in Figure 2.[4][5]

raghu_nambiar_1-1728919625277.png

 Figure 2: Ansys CFX

Ansys® Fluent® is a fluid simulation application renowned for its advanced physics modeling capabilities and industry-leading accuracy.

The 5th Gen AMD EPYC 9575F processor offers ~1.57x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9554 processor surpasses it by ~1.21x, as illustrated in Figure 3.[6][7]

raghu_nambiar_2-1728919625278.png

 Figure 3: Ansys Fluent

Advanced Physics Modeling with Ansys® LS-DYNA®

Ansys® LS-DYNA® is a widely utilized explicit simulation program designed to model complex, short-duration events across industries such as automotive, aerospace, construction, military, manufacturing, and bioengineering.

The 5th Gen AMD EPYC 9575F processor offers ~1.63x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9554 processor surpasses it by ~1.30x, as illustrated in Figure 4.[8][9]

raghu_nambiar_3-1728919625279.png

 Figure 4: Ansys LS-DYNA

Structural Analysis with Altair® Radioss™

Altair® Radioss™ is designed for structural analysis under impact or crash conditions, offering benchmarks that measure hardware performance through a series of representative problems.

The 5th Gen AMD EPYC 9575F processor offers ~1.58x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9554 processor surpasses it by ~1.18x, as illustrated in Figure 5.[10][11]

raghu_nambiar_4-1728919625280.png

 Figure 5: Altair Radioss

Open-Source High Performance Computing

The previous section showcased the exceptional performance of our AMD EPYC processors for commercial HPC workloads. This section explores the performance of key open-source workloads, including weather forecasting, quantum chemistry, molecular dynamics, and computational fluid dynamics. Open-source software typically incurs no licensing costs, which means that customers can more freely scale their performance by utilizing additional cores. We will compare generational performance improvements across these vital HPC workloads by primarily focusing on the top-tier 5th Gen AMD EPYC and 4th Gen AMD EPYC processors versus the 5th Gen Intel Xeon Platinum 8592+.

Weather Forecasting

The Weather Research and Forecasting (WRF) model is an advanced mesoscale numerical weather prediction system tailored for both atmospheric research and operational forecasting. It includes two dynamic cores, a data assimilation system, and a software architecture that enables parallel computation and easy extensibility. WRF is versatile, supporting various meteorological applications across scales ranging from tens of meters to thousands of kilometers.

The 5th Gen AMD EPYC 9755 processor offers ~2.19xtimes the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.56x, as illustrated in Figure 6. Please see WRF® on 5th Gen AMD EPYC™ Processors for additional information.

raghu_nambiar_5-1728919625281.png

 Figure 6: WRF

Molecular Dynamics

GROMACS is a molecular dynamics application that simulates Newtonian motion for systems ranging from hundreds to millions of particles.

The 5th Gen AMD EPYC 9755 processor offers ~3.23x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.92x, as illustrated in Figure 7.[12][13]

raghu_nambiar_6-1728919625282.png

 Figure 7: Relative GROMACS performance

LAMMPS is a classical molecular dynamics application with a focus on materials modeling.  LAMMPS has uses for both solid state materials and biosciences.

The 5th Gen AMD EPYC 9755 processor offers ~2.19x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.53x, as illustrated in Figure 8.[14][15]

raghu_nambiar_7-1728919625283.png

 Figure 8: LAMMPS

Nanoscale Molecular Dynamics

Nanoscale Molecular Dynamics (NAMD) conducts high-performance simulations of large biomolecular systems, encompassing system preparation, analysis, and result interpretation.

The NAMD STMV-20M benchmark was used to compare system performance. The 5th Gen AMD EPYC 9755 processor offers ~3.24x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.93x, as illustrated in Figure 9.[16][17].

raghu_nambiar_8-1728919625284.png

 Figure 9: NAMD

Materials Modeling

Quantum ESPRESSO is an open-source suite for nanoscale electronic-structure calculations and materials modeling using density-functional theory, plane waves, and pseudopotentials. The Quantum ESPRESSO 7.0 ausurf benchmark was used to compare system performance.

The 5th Gen AMD EPYC 9755 processor offers ~2.09x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.15x, as illustrated in Figure 10. [18][19]

raghu_nambiar_9-1728919625284.png

 Figure 10: Quantum ESPRESSO

OpenFOAM®

OpenFOAM® is a free, open-source CFD software widely used by both commercial and academic organizations.

The 5th Gen AMD EPYC 9755 processor offers ~1.67x times the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 processor surpasses it by ~1.17x, as illustrated in Figure 11. Please see OpenFOAM® on 5th Gen AMD EPYC™ Processors for additional information.

raghu_nambiar_10-1728919625285.png

 Figure 11: OpenFOAM

Uncompromised AI

AI advancements are revolutionizing how we live and work. AMD leads the way with a diverse range of compute engines and technologies optimized for efficient AI platforms, from edge devices to datacenters. This includes the AMD Ryzen™ 7040 Series Processors, AMD Versal™ adaptive SoCs, AMD EPYC processors, and AMD Instinct™ MI accelerators. 5th Gen AMD EPYC processors are particularly notable for their high core densities, extensive memory bandwidth, and exceptional efficiency that make them ideal for enterprise AI tasks by outperforming the 5th Gen Intel Xeon processors. The continual advancement of AI technology makes choosing the right hardware essential for achieving peak performance. AMD stands out as the top choice by delivering superior solutions that enhance AI capabilities across the entire technology spectrum from edge to datacenters.

End-to-End AI

The Transaction Processing Performance Council (TPC) TPCx-AI benchmark assesses the complete AI pipeline using a comprehensive dataset from a retail data center, covering key business data such as customer details, orders, financials, and product information. It includes diverse enterprise use cases like customer segmentation, conversation transcription, sales forecasting, spam detection, price prediction, classification, and fraud detection.

The 5th Gen AMD EPYC 9965 processor delivers ~3.77x and the 4th Gen AMD EPYC 9654 processor delivers ~1.66x the performance of the 5th Gen Intel Xeon Platinum 8592+, as illustrated in Figure 12.[20]

raghu_nambiar_11-1728919625286.png

 Figure 12: End-to-end AI

Gradient Boosting

Gradient boosting is a machine learning technique for regression and classification. XGBoost (eXtreme Gradient Boosting) is a popular, efficient open-source implementation that manages large datasets, supports parallel processing for fast training, and effectively handles missing values. Its performance and versatility make it suitable for various applications, as demonstrated by the use case models and datasets in its repository.

The 5th Gen AMD EPYC 9965 processor delivers ~3.00x the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 delivers ~1.20x the performance of the same Intel processor, as illustrated in Figure 13.[21]

raghu_nambiar_12-1728919625286.png

 Figure 13:  Gradient Boosting

Similarity Search

Similarity search is a key data retrieval and mining task that aims to find objects most similar to a given query based on defined similarity measures. The Facebook AI Similarity Search (FAISS) library facilitates fast, scalable searches for similar multimedia files. It surpasses traditional databases by enabling k-means nearest-neighbor (KNN) searches across large datasets with optimal balance of memory, speed, and accuracy.

The 5th Gen AMD EPYC 9965 processor delivers ~3.64 the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 delivers ~2.52x the performance of the same Intel processor, as illustrated in Figure 14.[22]

raghu_nambiar_13-1728919625287.png

 Figure 14: Similarity Search

Multi-Task Learning

Multi-task learning (MTL) is a machine learning approach where a single model is trained to handle multiple tasks simultaneously. This method enhances overall performance by sharing knowledge across tasks instead of training separate models for each one. The Multi-gate Mixture-of-Experts (MMoE) architecture advances MTL by incorporating multiple gating mechanisms, allowing it to manage diverse and interconnected tasks more effectively.

The 5th Gen AMD EPYC 9965 processor delivers ~2.99x the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 delivers ~1.41x the performance of the same Intel processor, as illustrated in Figure 15.[23]

raghu_nambiar_14-1728919625288.png

 Figure 15: Multi-task Learning

Large Language Models (LLM)

LLMs are advanced artificial intelligence models designed to understand and generate human-like responses based on the input they receive. LLMs are trained on vast amounts of text data, allowing them to learn grammar, context, and even nuances of language. People typically think of LLMs as an area where high performance GPUs are required; however, it is becoming increasingly apparent that high-performance CPUs such as 5th Gen AMD EPYC processors deliver strong performance on smaller LLM models.

GPT-J is a six billion parameter open-source English autoregressive language model trained on the Pile dataset. AMD used multiple instances of this LLM to demonstrate aggregated inference performance (total tokens per second). The 5th Gen AMD EPYC 9965 processor delivers up to ~2.84x the performance of the 5th Gen Intel Xeon Platinum 8592+ processor, and the 4th Gen AMD EPYC 9654 delivers ~1.68x the performance of the same Intel processor for the “Translate” use case for inference on the GPT-J BF16 model for batch-size=16 in tokens per second. The “Summary” use case involves submitting 1024-token input prompts and the model returning a summary in 128-token responses. The “Translate” use-case involves submitting 1024-token input prompts and the model returning a translation in 1024-token responses. See Figure 16.[24]

raghu_nambiar_15-1728919625289.png

 Figure 16: GPT-J

Host Processor

The most demanding AI workloads require high performance AI acceleration technologies. A host processor acts as the primary CPU in a multi-GPU server and plays a vital role in running the operating system and managing the execution of applications and services. Choosing the right CPU for your multi-GPU server is therefore essential for achieving optimal performance and maximizing the return on your GPU investments.

This section delves into how the host processor accelerates GPU computing. Every accelerated application involves kernel launches, data transfers between the host CPU and GPUs, and preprocessing data for GPU consumption. AMD developed specific workloads to assess the performance of kernel launches and data transfers that are publicly available at Host_Processor_Microbenchmarks*.  AMD employed the widely recognized Grok-1 framework from xAI to evaluate the preprocessing aspect.

The tested configuration includes a server powered by two 5th Gen AMD EPYC 9575F processors and equipped with eight AMD Instinct MI300 Series GPUs versus a server powered by two 5th Gen Intel Xeon Platinum 8592+ processors that is also equipped with eight AMD Instinct MI300 Series GPUs. Figure 17 shows the 5th Gen AMD EPYC 9575F processor-based system demonstrating performance speedups of ~1.01x, ~1.51, and ~2.37x for kernel launches, Grok1- xAI preprocessing, and MemCopy data transfers, respectively.[25]

raghu_nambiar_16-1728919625289.png

Figure 17: Host processor

Wrapping Up

My previous blog examined the performance of the 5th generation AMD EPYC processors for enterprise and cloud workloads. This blog provided an overview of the unique features of 5th Gen AMD EPYC processors in the context of demanding HPC and AI workloads. I compared the 5th Gen processors against both 4th Gen AMD EPYC and 5th Gen Intel Xeon processors. AMD EPYC 9005 Series Processors offer a range of configurations from 8 to 192 cores and TDPs from 155 to 500W. This analysis often selected the 64-core 5th Gen AMD EPYC 9575F with a max boost frequency of 5.0 GHz barrier because of its high relevance to software stacks with per-core licenses. For open-source applications, I typically chose CPUs with higher core counts to ensure optimal performance because these applications are less constrained by license cost considerations.

Raghu Nambiar is a Corporate Vice President of Data Center Ecosystems and Solutions for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

References

  1. EPYC-18: Max boost for AMD EPYC processors is the maximum frequency achievable by any single core on the processor under normal operating conditions for server systems.
  2. 9xx5-031: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9575F powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Altair AcuSolve. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * acus-f1: ~1.62x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode, OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9575F (128 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: VOLCANO RVOT1000C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. Altair and AcuSolve are trademarks of Altair Engineering, Inc.
  3. 9xx5-030: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9554 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Altair AcuSolve. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * acus-f1: ~1.23x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode, OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9554 (128 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: Titanite_4G RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off, Runtime Options: cpupower idle-set -d 2,cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. Altair and AcuSolve are trademarks of Altair Engineering, Inc.
  4. 9xx5-037: AMD testing as of 09/17/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9575F powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys CFX. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * Automotive Pump: ~1.44x, * LeMans Car: ~1.50x, * Airfoil 50: ~1.58x, * Airfoil 100: ~1.60x, *  Airfoil 10: ~1.55x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9575F (128 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: VOLCANO RVOT1000C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings. ANSYS, CFX, and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries.
  5. 9xx5-036: AMD testing as of 09/17/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9554 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys CFX. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * Automotive Pump: ~1.17x, * LeMans Car: ~1.21x, * Airfoil 50: ~1.20x, * Airfoil 100: ~1.21x, *  Airfoil 10: ~1.21x, System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9554 (128 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: Titanite_4G RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. ANSYS, CFX, and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries.
  6. 9xx5-033: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Core Solver Rating) of this benchmark for a 2P 64-Core AMD EPYC™ 9575F powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys Fluent. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * sedan_4m: ~1.52x, * rotor_3m: ~1.58x, * pump_2m: ~1.56x, * fluent-race280: ~1.64x, * oil_rig_7m: ~1.37x, * LeMans_6000_16m: ~1.61x, * landing_gear_15m: ~1.65x, * fluent-ice2: ~1.33x, * fluidized_bed_2m: ~1.51x, * f1_racecar-140m: ~1.67x, * exhaust_system_33m: ~1.60x, * combustor_71m: ~1.63x, * combustor_12m: ~1.55x, * aircraft_14m: ~1.68x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9575F (128 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: VOLCANO RVOT1000C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. ANSYS, FLUENT and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries.
  7. 9xx5-032: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Core Solver Rating) of this benchmark for a 2P 64-Core AMD EPYC™ 9554 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys Fluent. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * sedan_4m: ~1.18x. * rotor_3m: ~1.25x, * pump_2m: ~1.17x, * fluent-race280: ~1.23x, * oil_rig_7m: ~1.11x, * LeMans_6000_16m: ~1.26x, * landing_gear_15m: ~1.27x, * fluent-ice2: ~1.11x, * fluidized_bed_2m: ~1.20x, * f1_racecar-140m: ~1.25x, * exhaust_system_33m: ~1.21x, * combustor_71m: ~1.23x, * combustor_ 12m: ~1.25x, * aircraft_14m: ~1.29x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9554 (128 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: Titanite_4G RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings.  ANSYS, FLUENT and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries.
  8. 9xx5-035A: AMD testing as of 10/03/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9575F powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys LS-DYNA. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * Neon: ~1.68x, * Car2Car: ~1.72x, * 3 Cars: ~1.49x, * ODB 10m: ~1.63x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS:  ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9575F (128 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: None RVOT1000C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings. ANSYS, LS-DYNA and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries. LS-DYNA is a registered trademark of Livermore Software Technology Corporation.
  9. 9xx5-034A: AMD testing as of 10/03/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9554 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Ansys LS-DYNA. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * Neon: ~1.40x, * Car2Car: ~1.30x, * 3 Cars: ~1.27x, * ODB 10m: ~1.22x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS:  ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode, OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9554 (128 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings. ANSYS, LS-DYNA and any and all ANSYS, Inc. brand, product, service and feature names, logos and slogans are registered trademarks or trademarks of ANSYS, Inc. or its subsidiaries in the United States or other countries. LS-DYNA is a registered trademark of Livermore Software Technology Corporation.
  10. 9xx5-029: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9575F powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Altair RADIOSS. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * rad-dropsander: ~1.74x, * rad-neon: ~1.59x, * rad-venza: ~1.44x, * rad-t10m: ~1.53x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9575F (128 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: VOLCANO RVOT1000C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. Altair and Radioss are trademarks of Altair Engineering, Inc.
  11. 9xx5-028: AMD testing as of 09/12/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 64-Core AMD EPYC™ 9554 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Altair Radioss. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * rad-dropsander: ~1.36x, * rad-neon: ~1.14x, * rad-venza: ~1.09x, * rad-t10m: ~1.15x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: ThinkSystem SR650 V3 ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 64-Core AMD EPYC™ 9554 (128 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: Titanite_4G RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on factors including but not limited to system configurations, software versions, and BIOS settings. Altair and Radioss are trademarks of Altair Engineering, Inc.
  12. 9XX5-096: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for a 2P 128-Core AMD EPYC™ 9755 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Open-Source GROMACS. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * benchPEP: ~3.35x, * gmx_water1536K_PME: ~3.12x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None, ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 128-Core AMD EPYC™ 9755 (256 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: None RVOT1000A1; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  13. 9XX5-095: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for a 2P 96-Core AMD EPYC™ 9654 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Open-Source GROMACS. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * benchPEP: ~1.91x, * gmx_water1536K_PME: ~1.94x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 96-Core AMD EPYC™ 9654 (192 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: None RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  14. 9XX5-098: AMD testing as of 10/07/2024. The detailed results show the arithmetic mean of the performance metric (Number of Atoms) of this benchmark for the 128-Core AMD EPYC™ 9755, and the 64-Core Intel® Xeon® PLATINUM 8592+ running select tests on Open-Source lammps, both in Number of Atoms. For this application, a higher result indicates better performance. 64-Core Intel® Xeon® PLATINUM 8592+, lammps_HECBioSim_1400k: 1.256, 1.249, 1.255, lammps_HECBioSim_20k: 71.11, 71.027, 71.024, lammps_HECBioSim_3000k: 0.615, 0.616, 0.616, lammps_HECBioSim_465k: 3.629, 3.637, 3.644, lammps_HECBioSim_61k: 26.244, 26.235, 26.232. 128-Core AMD EPYC™ 9755, lammps_HECBioSim_1400k: 2.961, 2.964, 2.926, lammps_HECBioSim_3000k: 1.375, 1.415, 1.403, lammps_HECBioSim_465k: 8.453, 8.392, 8.418, lammps_HECBioSim_61k: 58.411, 58.448, 58.292, lammps_HECBioSim_20k: 127.672, 127.978, 127.321.
  15. 9XX5-097: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (Number of Atoms) of this benchmark for a 2P 96-Core AMD EPYC™ 9654 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Open-Source lammps. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * lammps_HECBioSim_1400k: ~1.55x, * lammps_HECBioSim_20k: ~1.47x, * lammps_HECBioSim_3000k: ~1.57x, * lammps_HECBioSim_465k: ~1.60x, * lammps_HECBioSim_61k: ~1.47x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 96-Core AMD EPYC™ 9654 (192 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: None RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.3 5.14.0-362.8.1.el9_3.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  16. 9xx5-094: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for a 2P 128-Core AMD EPYC™ 9755 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+  powered system running select tests on Open-Source NAMD. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * namd-stmv20M: ~3.24x, * namd-apoa1: ~1.35x, * namd-f1atpase: ~1.79x, * namd-stmv: ~2.23x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 128-Core AMD EPYC™ 9755 (256 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: None RVOT1000A1; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  17. 9xx5-093: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (ns/day) of this benchmark for a 2P 96-Core AMD EPYC™ 9654  powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+  powered system running select tests on Open-Source NAMD. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * namd-stmv20M: ~1.93x, * namd-apoa1: ~1.13x, * namd-f1atpase: ~1.23x, * namd-stmv: ~1.37x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 96-Core AMD EPYC™ 9654 (192 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: None RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.3 5.14.0-362.8.1.el9_3.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  18. 9xx5-100: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 128-Core AMD EPYC™ 9755 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Open-Source quantum_espresso. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * qe-7.0_Ta205: ~1.98x, * qe-7.0_Ausurf: ~2.20x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 128-Core AMD EPYC™ 9755 (256 total cores); Memory: 24x 64 GB DDR5-6000; Storage: SAMSUNG MZWLO3T8HCLS-00A07; Platform and BIOS: None RVOT1000A1; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  19. 9xx5-099: AMD testing as of 10/07/2024. The detailed results show the average uplift of the performance metric (Elapsed Time) of this benchmark for a 2P 96-Core AMD EPYC™ 9654 powered system compared to a 2P 64-Core Intel® Xeon® PLATINUM 8592+ powered system running select tests on Open-Source quantum_espresso. Uplifts for the performance metric normalized to the 64-Core Intel® Xeon® PLATINUM 8592+ follow for each benchmark: * qe-7.0_Ta205: ~1.16x, * qe-7.0_Ausurf: ~1.13x. System Configurations: CPU: 2P 64-Core Intel® Xeon® PLATINUM 8592+ (128 total cores); Memory: 16x 64 GB DDR5-5600; Storage: KIOXIA KCMYXRUG3T84; Platform and BIOS: None ESE122V-3.10; BIOS Options: SMT=Off, High Performance Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: processor.max_cstate=1, intel_idle.max_cstate=0, iommu=pt, mitigations=off; Runtime Options: cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. CPU: 2P 96-Core AMD EPYC™ 9654 (192 total cores); Memory: 24x 64 GB DDR5-4800; Storage: SAMSUNG MZQL21T9HCJR-00A07; Platform and BIOS: None RTI1009C; BIOS Options: SMT=Off, NPS=4, Power Determinism Mode; OS: rhel 9.4 5.14.0-427.16.1.el9_4.x86_64; Kernel Options: amd_iommu=on, iommu=pt, mitigations=off; Runtime Options: cpupower idle-set -d 2, cpupower frequency-set -g performance, echo 3 > /proc/sys/vm/drop_caches, echo 0 > /proc/sys/kernel/nmi_watchdog, echo 0 > /proc/sys/kernel/numa_balancing, echo 0 > /proc/sys/kernel/randomize_va_space, echo 'always' > /sys/kernel/mm/transparent_hugepage/enabled, echo 'always' > /sys/kernel/mm/transparent_hugepage/defrag. Results may vary based on system configurations, software versions, and BIOS settings.
  20. 9xx5-012: TPCxAI @SF30 Multi-Instance 32C Instance Size throughput results based on AMD internal testing as of 09/05/2024 running multiple VM instances. The aggregate end-to-end AI throughput test is derived from the TPCx-AI benchmark and as such is not comparable to published TPCx-AI results, as the end-to-end AI throughput test results do not comply with the TPCx-AI Specification. 2P AMD EPYC 9965 (384 Total Cores), 12 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled). 2P AMD EPYC 9755 (256 Total Cores), 8 32C instances, NPS1, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled). 2P AMD EPYC 9654 (192 Total cores) 6 32C instances, NPS1, 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe, Ubuntu 22.04.3 LTS, BIOS 1006C (SMT=off, Determinism=Power). Versus 2P Xeon Platinum 8592+ (128 Total Cores), 4 32C instances, AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe, , Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled). Results: CPU Median Relative Generational Turin 192C, 12 Inst 6067.531 3.775 2.278, Turin 128C, 8 Inst 4091.85 2.546 1.536, Genoa 96C, 6 Inst 2663.14 1.657 1, EMR 64C, 4 Inst 1607.417 1 NA. Results may vary due to factors including system configurations, software versions and BIOS settings. TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.
  21. 9xx5-040A: XGBoost Configurations: v2.2.1, Higgs Data Set, 32 Core Instances, FP32. 2P AMD EPYC 9965 (384 Total Cores), 12 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-45-generic (tuned-adm profile throughput-performance, ulimit -l 198078840, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1. 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198094956, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1. 2P AMD EPYC 9654 (192 Total cores), 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198120988, ulimit -n 1024, ulimit -s 8192), BIOS TTI100BA (SMT=off, Determinism=Power), NPS=1. Versus 2P Xeon Platinum 8592+ (128 Total Cores), AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe®, Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled). Results: CPU Run 1 Run 2 Run 3 Median Relative Throughput Generational; 2P Turin 192C, NPS1 1565.217 1537.367 1553.957 1553.957 3 2.41; 2P Turin 128C, NPS1 1103.448 1138.34 1111.969 1111.969 2.147 1.725; 2P Genoa 96C, NPS1 662.577 644.776 640.95 644.776 1.245 1; 2P EMR 64C 517.986 421.053 553.846 517.986 1 NA. Results may vary due to factors including system configurations, software versions and BIOS settings.
  22. 9xx5-042: FAISS (Requests/Hour) throughput results based on AMD internal testing as of 09/05/2024. FAISS Configurations: v1.8.0.post1, sift1m Data Set, 32 Core Instances, FP32, MKL 2024.2.1. 2P AMD EPYC 9965 (384 Total Cores), 12 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1. 2P AMD EPYC 9755 (256 Total Cores), 8 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1. 2P AMD EPYC 9654 (192 Total cores) 6 x 32 core instances, 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe,  Ubuntu 22.04.3 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS 1006C (SMT=off, Determinism=Power), NPS=1. Versus 2P Xeon Platinum 8592+ (128 Total Cores), 6 x 32 core instances, AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe, , Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled). Results: CPU Run 1 Run 2 Run 3 Median Relative Throughput Generational, 2P Turin 192C 63.6 63.24 63.21 63.24 3.637 1.442, 2P Turin 128C 46.62 46.61 46.65 46.62 2.681 1.063, Genoa 96C 43.904 43.853 43.799 43.853 2.522 1, 2P EMR 64C 17.41 17.36 17.39 17.39 1 NA. Results may vary due to factors including system configurations, software versions and BIOS settings.
  23. 9xx5-049: MMoE (Runs/Hour) throughput results based on AMD internal testing as of 09/05/2024. MMoE Configurations: Version r1.15.5-deeprec2302, Taobao Data Set, 32 Core Instances, FP32. 2P AMD EPYC 9965 (384 Total Cores), 12 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1. 2P AMD EPYC 9755 (256 Total Cores), 8 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=4. 2P AMD EPYC 9654 (192 Total cores), 6 x 32 core instances, 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe®, Ubuntu 22.04.3 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198096812, ulimit -n 1024, ulimit -s 8192), BIOS 1006C (SMT=off, Determinism=Power), NPS=1. Versus 2P Xeon Platinum 8592+ (128 Total Cores), 4 x 32 core instances, AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe®, Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled). Results: CPU Run 1 Run 2 Run 3 Median Relative Throughput Generational, 2P Turin 192C 41.34 41.31 41.54 41.34 2.989 2.115, 2P Turin 128C 28.4 28.4 28.4 28.4 2.054 1.453, 2P Genoa 96C 19.47 19.65 19.55 19.55 1.414 1, 2P EMR 64C 13.87 13.73 13.83 13.83 1 NA. Results may vary due to factors including system configurations, software versions and BIOS settings.
  24. 9xx5-065: GPT-J-6B throughput results based on AMD internal testing as of 09/30/2024. GPT-J-6B configurations: ZenDNN 5.0 (zentorch) and IPEX 2.4.0, BF16, batch size 16, Use Case Input/Output token configurations: [Summary = 1024/128, Chatbot = 128/128, Translate = 1024/1024, Essay = 128/1024, Caption = 16/16]. 2P AMD EPYC 9965 (384 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1 DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.3 LTS, Linux 5.15.0-119-generic, BIOS RVOT1000C, (SMT=off, Determinism=Power, Frequency Boost=Enabled), NPS=1, ZenDNN 5.0. 2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1 DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, Linux 6.8.0-40-generic, BIOS RVOT0090F, (SMT=off, Determinism=Power, Frequency Boost=Enabled), NPS=1, ZenDNN 5.0. 2P AMD EPYC 9654 (192 Total Cores), 1.5TB 24x64GB DDR5-4800 (at 6000 MT/s), 1 DPC, MT28908 Family [ConnectX-6], 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, Linux 5.15.85-051585-generic, BIOS RTI1008C, (SMT=off, Determinism=Power, Frequency Boost=Enabled), NPS=1, ZenDNN 5.0. Versus 2P Xeon Platinum 8592+ (128 Total Cores), AMX On, 1.8TB 128GB DIMM DDR5-4800 (running at 4400 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, Samsung SSD 970 EVO Plus 1TB, Ubuntu 22.04.3 LTS Linux 6.8.0-40-generic, BIOS 2.1a (SMT=off, Determinism=Power, Turbo Boost b= Enabled), IPEX 2.4.0. Results: CPU 8592+ 9965 9755 9654, Summary 149.949 361.283 304.801 171.833, Chatbot 323.274 725.659 640.038 372.625, Translate 135.866 386.077 346.346 228.482, Essay 241.239 536.153 489.585 315.229, Caption 426.433 949.117 830.371 492.055, Average 255.352 591.658 522.228 316.045, Competitive 1 2.389 2.108 1.288, Normal to 9654 0.776 1.854 1.636 1. Results may vary due to factors including system configurations, software versions and BIOS settings.
  25. 9xx5-084: Comparisons based on AMD internal testing as of 10/06/2024. Workloads: MemCopy v1.0 (8 threads / 8 GPUs, nvhpc 24.3, KernelLaunch v1.0 (8 threads / 8 GPUs, nvhpc 24.3), Grok1-324B (FP16, JAX 0.4.25, nvhpc 24.3, sentencepiece 0.2.0, numpy 1.26.4, dm_haiky 0.0.12, 2 / 8 experts, 11 token input prompt with 105 token output prompt). 2P AMD EPYC 9575F (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1.5TB 24x64GB DDR5-6000, 1.0 Gbps 3TB Micron_9300_MTFDHAL3T8TDP NVMe®, BIOS T20240805173113 (Determinism=Power,SR-IOV=On), Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic (mitigations=off, cpupower frequency-set -g performance, cpupower idle-set -d 2, echo 3> /proc/syss/vm/drop_caches), - average over 3 runs 77.13 seconds (MemCopy), - average over 3 runs 213.97 seconds (Kernel Launch), - average over 3 runs 99.00 seconds (Grok). 2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1TB 16x64GB DDR5-5600, 3.2TB Dell Ent NVMe® PM1735a MU, Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic, (processor.max_cstate=1, intel_idle.max_cstate=0 mitigations=off, cpupower frequency-set -g performance). - average over 3 runs 183.58 seconds (MemCopy), - average over 3 runs 212.67 seconds (Kernel Launch), - average over 3 runs 163.98 seconds (Grok). For 138.01% performance gain in MemCopy. For 0.61% performance gain in KernelLaunch. For 51.77% performance gain in Grok1-324B. Or 53.7% the overall performance gain (geometric mean). Results may vary due to factors including system configurations, software versions and BIOS settings.
About the Author
Raghu Nambiar currently holds the position of Corporate Vice President at AMD, where he leads a global engineering team dedicated to shaping the software and solutions strategy for the company's datacenter business. Before joining AMD, Raghu served as the Chief Technology Officer at Cisco UCS, instrumental in driving its transformation into a leading datacenter compute platform. During his tenure at Hewlett Packard, Raghu made significant contributions as an architect, pioneering several groundbreaking solutions. He is the holder of ten patents, with several more pending approval, and has made extensive academic contributions, including publishing over 75 peer-reviewed papers and 20 books in the LNCS series. Additionally, Raghu has taken on leadership roles in various industry standards committees. Raghu holds dual Master's degrees from the University of Massachusetts and Goa University, complemented by completing an advanced management program at Stanford University.