Discover the world’s leading CPU for AI at SC24

Joshua_Howell · ‎11-13-2024

Artificial intelligence (AI) is fueling business growth and innovation on a global scale. AI technologies bring broad industry impact, transforming how we work and driving efficiency, insights, and competitiveness. Organizations of nearly every size can harness the power of AI to uncover new opportunities, optimize their operations, and deliver value to customers in ways that were previously unimaginable. We are only beginning to see the potential of AI on applications that automate processes in manufacturing and automotive, curb financial fraud, and create breakthroughs in medical research. The possibilities are truly limitless, yet organizations often lack the infrastructure to keep pace with the demands of AI.

AI requires exceptional performance, flexibility, and capacity to process massive datasets and convert the data into real-time insights. Many data centers are already running at or near capacity in terms of available space, power, or both. To overcome these challenges, organizations must free up space and energy to accommodate AI in their data centers.

AMD helps organizations transform their environments with purposefully engineered solutions that are built for highly complex workloads. AMD EPYC™ processors are the world’s leading CPU for AI.ⁱ AMD EPYC processor-based servers offer leadership performance and efficiency to enable material workload consolidation. This capability allows for more space and energy to support new AI workloads in existing data centers. As a result, organizations can accelerate the full range of data center workloads — including general purpose AI, model development, testing, training, and inference.

5^th Generation AMD EPYC processors are the latest addition to this robust family. The latest processors offer key advantages for organizations looking to accomplish more with AI:

Maximizing per-server performance. 5^th Generation AMD EPYC processors can match integer performance of legacy hardware with up to 86% fewer racks, dramatically reducing physical footprint, power consumption, and the number of software licenses needed. This frees up space for new or expanded AI workloads.ⁱⁱ

Delivering leadership AI inference performance. Many AI workloads can run efficiently on CPU-only servers that feature 5^th Generation AMD EPYC processors. Some of these workloads include language models with up to 13 billion parameters, image and fraud analysis, or recommendation systems. Servers running two of the latest CPUs offer up to 2x inference throughput when compared to previous generation offerings.ⁱⁱⁱ

Increasing GPU acceleration. For larger and more demanding workloads, GPUs may be the right choice for AI workload processing. The AMD EPYC family includes options that are optimized to be host-CPUs for GPU-enabled systems to help increase performance on select AI workloads and improve the ROI of advanced GPU AI engines. For example, a high frequency AMD EPYC 9575F processor powered server with 8x GPUs delivers up to 20% greater system performance than a server with Intel Xeon 8592+ processors as the host CPU with the same 8x GPUs running Llama3.1-70B.^iv

Offering a broad ecosystem of support. AMD collaborates with an extensive network of solution providers whose solutions feature the latest AMD EPYC processors. Companies and government organizations around the globe trust AMD to enhance their most important workloads. 5^th Generation AMD EPYC processors are available today, with support from industry leaders in supercomputing and AI as well as all major ODMs and cloud service providers.

Want to learn more about the world’s leading CPU for AI? Join us at SC24 on November 17^th–22^nd in Atlanta. Visit the AMD booth to meet with our experts and watch technology demos showcasing AMD EPYC processors at Hardware Zone 4 and Zone 2 #8 and #9.

Let’s accelerate your path to AI leadership.

Footnotes

¹ https://www.amd.com/en/products/processors/server/epyc/ai.html

² 9xxx5TCO-001B: This scenario contains many assumptions and estimates and, while based on AMD internal research and best approximations, should be considered an example for information purposes only, and not used as a basis for decision making over actual testing. The AMD Server & Greenhouse Gas Emissions TCO (total cost of ownership) Estimator Tool - version 1.12, compares the selected AMD EPYC™ and Intel® Xeon® CPU based server solutions required to deliver a TOTAL_PERFORMANCE of 39100 units of SPECrate2017_int_base performance as of October 8, 2024. This scenario compares a legacy 2P Intel Xeon 28 core Platinum_8280 based server with a score of 391 versus 2P EPYC 9965 (192C) powered server with a score of 3030 (https://spec.org/cpu2017/results/res2024q3/cpu2017-20240923-44833.pdf) along with a comparison upgrade to a 2P Intel Xeon Platinum 8592+ (64C) based server with a score of 1130 (https://spec.org/cpu2017/results/res2024q3/cpu2017-20240701-43948.pdf). Actual SPECrate®2017_int_base score for 2P EPYC 9965 will vary based on OEM publications.
Environmental impact estimates made leveraging this data, using the Country / Region specific electricity factors from the 2024 International Country Specific Electricity Factors 10 – July 2024 , and the United States Environmental Protection Agency 'Greenhouse Gas Equivalencies Calculator'.

³ 9xx5-040A: XGBoost (Runs/Hour) throughput results based on AMD internal testing as of 10/08/2024." with the below claim:
9xx5-040A: XGBoost (Runs/Hour) throughput results based on AMD internal testing as of 09/05/2024.
XGBoost Configurations: v2.2.1, Higgs Data Set, 32 Core Instances, FP32
2P AMD EPYC 9965 (384 Total Cores), 12 x 32 core instances, 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu® 22.04.4 LTS, 6.8.0-45-generic (tuned-adm profile throughput-performance, ulimit -l 198078840, ulimit -n 1024, ulimit -s 8192), BIOS RVOT1000C (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1
2P AMD EPYC 9755 (256 Total Cores), 1.5TB 24x64GB DDR5-6400 (at 6000 MT/s), 1DPC, 1.0 Gbps NetXtreme BCM5720 Gigabit Ethernet PCIe, 3.5 TB Samsung MZWLO3T8HCLS-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198094956, ulimit -n 1024, ulimit -s 8192), BIOS RVOT0090F (SMT=off, Determinism=Power, Turbo Boost=Enabled), NPS=1
2P AMD EPYC 9654 (192 Total cores), 1.5TB 24x64GB DDR5-4800, 1DPC, 2 x 1.92 TB Samsung MZQL21T9HCJR-00A07 NVMe®, Ubuntu 22.04.4 LTS, 6.8.0-40-generic (tuned-adm profile throughput-performance, ulimit -l 198120988, ulimit -n 1024, ulimit -s 8192), BIOS TTI100BA (SMT=off, Determinism=Power), NPS=1
Versus 2P Xeon Platinum 8592+ (128 Total Cores), AMX On, 1TB 16x64GB DDR5-5600, 1DPC, 1.0 Gbps NetXtreme BCM5719 Gigabit Ethernet PCIe, 3.84 TB KIOXIA KCMYXRUG3T84 NVMe®, Ubuntu 22.04.4 LTS, 6.5.0-35 generic (tuned-adm profile throughput-performance, ulimit -l 132065548, ulimit -n 1024, ulimit -s 8192), BIOS ESE122V (SMT=off, Determinism=Power, Turbo Boost = Enabled)
Results:
CPU Run 1 Run 2 Run 3 Median Relative Throughput Generational
2P Turin 192C, NPS1 1565.217 1537.367 1553.957 1553.957 3 2.41
2P Turin 128C, NPS1 1103.448 1138.34 1111.969 1111.969 2.147 1.725
2P Genoa 96C, NPS1 662.577 644.776 640.95 644.776 1.245 1
2P EMR 64C 517.986 421.053 553.846 517.986 1 NA
Results may vary due to factors including system configurations, software versions and BIOS settings.

⁴ 9xx5-014:  Llama3.1-70B inference throughput results based on AMD internal testing as of 09/01/2024.
Llama3.1-70B configurations: TensorRT-LLM 0.9.0, nvidia/cuda 12.5.0-devel-ubuntu22.04  , FP8, Input/Output token configurations (use cases): [BS=1024 I/O=128/128, BS=1024 I/O=128/2048, BS=96 I/O=2048/128, BS=64 I/O=2048/2048]. Results in tokens/second.
2P AMD EPYC 9575F    (128 Total Cores  ) with 8x NVIDIA H100 80GB HBM3, 1.5TB 24x64GB DDR5-6000, 1.0 Gbps 3TB Micron_9300_MTFDHAL3T8TDP NVMe®, BIOS T20240805173113 (Determinism=Power,SR-IOV=On), Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic (mitigations=off, cpupower frequency-set -g performance, cpupower idle-set -d 2, echo 3> /proc/syss/vm/drop_caches) ,
2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1TB 16x64GB DDR5-5600, 3.2TB Dell Ent NVMe® PM1735a MU, Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic, (processor.max_cstate=1, intel_idle.max_cstate=0 mitigations=off, cpupower frequency-set -g performance    ), BIOS 2.1, (Maximum performance, SR-IOV=On),
I/O Tokens Batch Size EMR Turin Relative
128/128 1024 814.678 1101.966 1.353
128/2048 1024 2120.664 2331.776 1.1
2048/128 96 114.954 146.187 1.272
2048/2048 64 333.325 354.208 1.063
For average throughput increase of 1.197x.
Results may vary due to factors including system configurations, software versions and BIOS settings.