cancel
Showing results for 
Search instead for 
Did you mean: 

The AMD Advantage for AI and Data Centers

The Modern Data Center: Blending AI and General-Purpose Computing

Data centers are increasingly the engine and the lifeblood driving commerce.  In today’s digital economy, web servers, databases, design and analysis systems and more are essential to businesses across the globe. But data centers are no longer just processing traditional enterprise workloads. Today, traditional enterprise applications— such as real-time recommendation engines, predictive maintenance, vision and language processing and machine learning—are augmented by AI, completely transforming the data center landscape to drive more innovation and productivity. The challenge? Powering the growing demands of these AI-augmented workloads efficiently while helping ensure availability and scalability for the future.

 

The Changing Nature of Workloads

Modern data centers are seeing a blend of:

  • General-Purpose Compute: Web hosting, ERP systems, transactional databases, analytics.
  • Enterprise AI Tasks: AI-powered fraud detection, document translation, natural language processing.
  • AI Model Inference & Training: AI-driven chatbots, real-time transcription, machine learning pipelines.

The mix of these workloads means IT leaders must ensure their enterprise infrastructure is agile enough to support both traditional “run-the business” apps and AI-powered applications—without unnecessary costs or complexity.

 

Hardware That’s Built for Modern Data Centers

 

To handle this evolving demand, the latest AMD EPYC CPUs are designed to excel in both traditional and AI workloads.  These CPUs power a complete and diverse portfolio of systems from all of the global trusted server solutions providers (and global cloud service providers) to meet the most demanding business needs.  These offerings feature:

  • Unparalleled x86 Core Density – Up to 192 cores per socket, with a full portfolio of CPU offerings enabling high-performance execution of both AI inference and general compute tasks of all sizes​.
  • Leadership CPU Memory Capacity & Bandwidth – Supporting terabytes of the latest high speed, industry-standard DDR5 memory, critical for scalable traditional workloads as well as AI models that require large datasets to be kept in memory​.
  • Scalability Without Disruption – The broadly-supported x86 architecture allows for seamless AI adoption without the lengthy code rewrites or costly software porting efforts needed to adapt enterprise code to alternative architectures​.
  • Energy Efficiency for AI and Business Apps – AMD EPYC outperforms NVIDIA Grace CPU Superchip by up to 2.75x[i] in power efficiency.

This flexibility allows enterprises to deploy AI within their existing x86 compute infrastructure, while enabling the possibility of deploying GPU-accelerated workloads when needed.

 

Preparing for the Continued Growth of AI

 

As AI adoption grows, workloads will continue to evolve, and enterprises need hardware that won’t hold them back. While GPUs are the ideal solution for training and large generative AI, most enterprise workloads using natural language processing, decision support systems, and classical machine learning can run efficiently on modern CPUs—the same infrastructure used to support the most demanding enterprise applications​.

Rather than building separate, siloed infrastructure for AI and general-purpose computing, data centers must be designed for versatility—and AMD EPYC delivers the performance, efficiency, and flexibility to make this shift seamless and cost effective.

The takeaway? Your compute infrastructure must be ready to support both AI and traditional workloads—with minimal operational cost. AMD EPYC CPUs help ensure your data center is future-ready, high-performance, and ready for the next wave of AI adoption.

 

CPUs: The Smart Choice to Get More from GPUs

 

It’s well known that large scale, low latency AI workloads benefit from GPU acceleration. What is often overlooked however is that for those workloads and deployments that require GPUs, selecting the right host CPU is a critical decision. 5th Gen AMD EPYC processors are the best choice for maximizing the performance of GPU-enabled clusters, providing up to 20% more throughput compared to competing x86 solutions[ii] [iii].

 

High-Frequency Host Processing to Fuel AI Acceleration

 

5th Gen AMD EPYC CPUs reach clock speeds of up to 5 GHz, offering 16% higher frequency than Intel’s top turbo frequency part, the recently announced 4.3GHz Xeon 6745P. It’s also substantially higher than the 3.1GHz base frequency of the Nvidia Grace Superchip. This increased clock speed enables faster data movement, task orchestration, and efficient GPU communication—key factors in high volume, low latency AI training and inference operations.

 

Leadership Memory Support for AI Workloads

 

While it is often ideal to try to fit entire models into the memory of a GPU, it is not always possible. In such cases, the server platform will be responsible for handling large quantities of data quickly and efficiently. With support for a broad range of memory configurations and capacities, as well as leadership bandwidth per socket, AMD EPYC CPUs can allow entire AI models and datasets to be stored in system memory, minimizing bottlenecks caused by storage read/write cycles​. This is a crucial advantage for real-time AI applications where rapid data access is critical.

 

Flexibility and Scale with Leadership PCIe Support

 

Data movement is a potential bottleneck in GPU-accelerated workloads, but AMD EPYC processors offer up to 160 PCIe® Gen5 lanes in dual-socket configurations, enabling rapid transfers between GPUs, storage, and networking infrastructure using the industry-standard technologies of your choice​. This gives AMD an edge in AI deployments and enterprise computing environments, where every millisecond counts and proprietary networking approaches can be costly and troublesome.

 

x86 Leadership: Enabling Enterprise AI

 

The enterprise market is more competitive than ever as companies face the challenge of doing more work on fixed financial and energy budgets, yet x86 architecture leadership in the data center remains. Real-world benchmarks and enterprise compatibility considerations make one thing clear: AMD EPYC processors, built on the x86 architecture, deliver impressive performance, efficiency, and broadly deployed workload compatibility compared to Arm®-based solutions, as shown below.

 

Performance Leadership: AMD EPYC vs. Nvidia Grace Superchip

 

When it comes to raw compute power, AMD EPYC processors decisively outperform Nvidia’s Grace Superchip across key workloads, including general-purpose computing, database transactions, AI inference, and high-performance computing (HPC)​.

Benchmark Result highlights:

  • AMD EPYC CPUs deliver more than 2x the performance of Nvidia Grace Superchip-based systems in workloads across multiple verticals​. This blog showcases several tested benchmarks with compares featuring the AMD EPYC 9004 processor family.  Stay tuned for an updated blog based on results including the latest EPYC 9005 family of CPUs—which dramatically extend the performance and efficiency advantage of EPYC processors.
  • For database workloads (MySQL TPROC-C transactions), AMD EPYC 9004-based dual-socket systems outperform NVIDIA Grace Superchip by ~2.17x[iv]​.
  • For video encoding (FFmpeg VP9 codec), AMD EPYC 9004 CPUs deliver ~2.90x higher throughput than Nvidia Grace[v]​.
  • In energy efficiency testing, based on SPECpower®, AMD EPYC 9754 CPU-based single- and dual-processor systems outperform an NVIDIA Grace Superchip system by ~2.50x[vi] and ~2.75x, respectively​.

These results confirm what industry professionals have long known: x86-based AMD EPYC processors deliver leadership performance and efficiency.

 

Simultaneous Multithreading (SMT): A Crucial x86 Advantage

 

One factor supporting the outstanding performance and efficiency of many x86 systems is that the x86 architecture features Simultaneous Multithreading (SMT), which allows each CPU core to execute two threads simultaneously, which can significantly increase overall throughput.

Why SMT Matters:

  • Improves efficiency in multi-threaded workloads like AI inference, cloud computing, and many enterprise applications​.
  • Enables optimum resource utilization, filling processing gaps when one thread is stalled.
  • Enhances power efficiency, as demonstrated in independent testing where AMD EPYC CPUs delivered in the range of 30-50% more performance with SMT enabled while consuming virtually the same power​.

Many Arm-based CPUs, including those from Nvidia and Ampere, lack SMT support, meaning they can leave valuable computing resources idle, which can result in lower overall efficiency, utilization and performance.

 

Proven Leadership and Industry Adoption

 

While several Arm-based CPUs are a new and relatively unproven entry, AMD EPYC has already established itself as the data center leader with:

  • More than 450 unique server designs across major hardware vendors​.
  • 1000+ cloud instances across the world’s biggest cloud service providers​.
  • Powering 162 of the fastest supercomputers in the world ​solving humanity’s toughest challenges.
  • Powering cutting-edge Internet services infrastructures that serve billions of people each day
  •  

The Verdict: AMD EPYC is the Clear Choice for AI and Data Centers

 

AMD EPYC CPUs excel in CPU inference, AI hosting, and overall data center performance. Whether you’re running AI on existing hardware, hosting high-performance GPU clusters, or looking for a cost-effective, power-efficient solution, AMD delivers:

  • Seamless AI Deployment on CPUs—Run many AI workloads efficiently without GPUs, helping save costs while maintaining high performance​.
  • Leadership GPU Host Performance—Boost GPU cluster efficiency by up to 20% with AMD EPYC CPUs​.
  • x86 Compatibility for Maximum Flexibility—Compared to Arm-based solutions, no expensive software porting plus compatibility across broadly deployed business critical applications enabling seamless integration​.
  • Impressive Memory & I/O Support—Up to 6TB of DDR5 and 160 PCIe Gen5 lanes in dual socket configurations for exceptional throughput​.
  • Leadership Energy Efficiency—SMT and optimized core designs maximize power efficiency without sacrificing performance​.

 

As AI and high-performance computing evolve, AMD continues to lead with cutting-edge innovations. Whether you’re looking to deploy AI today or future-ready your data center infrastructure, AMD EPYC CPUs are the clear, uncompromising choice.

 

Get started with AMD EPYC today. Discover how AMD can accelerate your AI workloads and data center operations.

Explore AMD EPYC for AI → Learn More

 

[i] SP5-280: As of 07/12/2024, a 2P AMD EPYC™ 9754 system delivers a 2.75x SPECpower_ssj® 2008 overall ssj_ops/watt uplift versus a 2P NVIDIA Grace™ CPU Superchip system. Configurations: 2P 128-core EPYC 9754 (36,398 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q2/power_ssj2008-20240327-01386.html) versus 72-core Nvidia Grace Superchip (13,218 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q3/power_ssj2008-20240515-01413.html. SPEC® and SPECpower_ssj® 2008 are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

 

[ii] 9xx5-059A: Stable Diffusion XL v2 training results based on AMD internal testing as of 10/10/2024. SDXL configurations: DeepSpeed 0.14.0, TP8 Parallel, FP8, batch size 24, results in seconds 2P AMD EPYC 9575F

(128 Total Cores) with 8x AMD Instinct MI300X-NPS1-SPX-192GB-750W, GPU Interconnectivity XGMI, ROCm™ 6.2.0-66, 2304GB 24x96GB DDR5-6000, BIOS 1.0 (power determinism = off), Ubuntu® 22.04.4

LTS, kernel 5.15.0-72-generic, 334.80 seconds 2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x AMD Instinct MI300X-NPS1-SPX-192GB-750, GPU Interconnectivity XGMI, ROCm 6.2.0-66, 2048GB

32x64GB DDR5-4400, BIOS 2.0.4, (power determinism = off), Ubuntu 22.04.4 LTS, kernel 5.15.0-72-generic, 400.43 seconds for 19.600% training performance increase. Results may vary due to factors

including system configurations, software versions, and BIOS settings.

 

[iii] 9xx5-014A: "9xx5-014A: Llama3.1-70B inference throughput results based on AMD internal testing as of 09/01/2024. 

Llama3.1-70B configurations: TensorRT-LLM 0.9.0, nvidia/cuda 12.5.0-devel-ubuntu22.04  , FP8, Input/Output token configurations (use cases): [BS=1024 I/O=128/128, BS=1024 I/O=128/2048, BS=96 I/O=2048/128, BS=64 I/O=2048/2048]. Results in tokens/second.

2P AMD EPYC 9575F    (128 Total Cores  ) with 8x NVIDIA H100 80GB HBM3, 1.5TB 24x64GB DDR5-6000, 1.0 Gbps 3TB Micron_9300_MTFDHAL3T8TDP NVMe®, BIOS T20240805173113 (Determinism=Power,SR-IOV=On), Ubuntu 22.04.3 LTS, kernel=5.15.0-117-generic (mitigations=off, cpupower frequency-set -g performance, cpupower idle-set -d 2, echo 3> /proc/syss/vm/drop_caches) , 

2P Intel Xeon Platinum 8592+ (128 Total Cores) with 8x NVIDIA H100 80GB HBM3, 1TB 16x64GB DDR5-5600, 3.2TB Dell Ent NVMe® PM1735a MU, Ubuntu 22.04.3 LTS, kernel-5.15.0-118-generic, (processor.max_cstate=1, intel_idle.max_cstate=0 mitigations=off, cpupower frequency-set -g performance    ), BIOS 2.1, (Maximum performance, SR-IOV=On), 

I/O Tokens Batch Size EMR Turin Relative Difference

128/128 1024 814.678 1101.966 1.353 287.288

128/2048 1024 2120.664 2331.776 1.1 211.112

2048/128 96 114.954 146.187 1.272 31.233

2048/2048 64 333.325 354.208 1.063 20.833

For average throughput increase of 1.197x.

When scaling to a 1000 node cluster (1 node = 2 CPUs and 8 GPUs) comparing the AMD EPYC 9575F system and Intel Xeon 8592+ system:

128/128 achieves 287,288 more tokens/s 

128/2048 achieves 211,112 more tokens/s 

2048/128 achieves 31,233 more tokens/s 

2048/2028 achieves 20,833 morere tokens/s

Results may vary due to factors including system configurations, software versions and BIOS settings. "

 

[iv] SP5-258: In AMD testing as of 04/04/2024, a 2P AMD EPYC™ 9654 system (2 x 96C) running 24 MySQL v8.0.37 VMs using 16 vCPU /VM delivers ~2.17x the HammerDB v4.4 TPROC-C performance of a 2P Grace CPU Superchip system (2 x 72C) running 9 VMs using 16 vCPU /VM. Each instance ran one MySQL database with the schema created for 256 warehouses. All VMs were simultaneously loaded by an individual HammerDB client instance per VM. The workload was run for 10 minutes each, and the aggregate of median New Orders Per Minute (NOPM) values were recorded across 5 runs per platform to compare relative performance. The HammerDB TPROC-C workload is an open-source workload derived from the TPC-C Benchmark™ Standard, and as such is not comparable to published TPC-C™ results, as the results do not comply with the TPC-C Benchmark Standard. AMD System configuration: CPU: 2 x AMD EPYC 9654 (96C, 384 MB L3 per CPU) on an AMD reference system; RAM: 24 x 128 GB DDR5-4800; Storage: 6 * 1.7 TB Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO; NIC: NetXtreme BCM5720 Gigabit Ethernet PCIe @ 1Gbe; BIOS: RTI1006C; BIOS Options: default; OS: Ubuntu 22.04.1 LTS; Kernel: Linux 5.15.0-105-generic; OS Options: Default. NVIDIA System configuration: CPU: 2 x NVIDIA Grace CPU Superchip (72C, 228 MB L3 per socket) on a production system; RAM: 2 x 240 GB LPDDR5_SDRAM-8532; Storage: 3 TB NVME SAMSUNG MZTL23T8HCLS-00A07 & 900 GB NVME SAMSUNG MZ1L2960HCJR-00A07; NIC: Ethernet Controller 10G X550T, 2Ports; BIOS: 1.1; BIOS Options: Default; OS: Ubuntu 22.04.4 LTS; Kernel: Linux 5.15.0-101-generic. Results may vary based on factors including but not limited to BIOS and OS settings and versions, software versions and data used. TPC, TPC Benchmark and TPC-C are trademarks of the Transaction Processing Performance Council.

 

[v] SP5-266: In AMD testing as of 04/16/2024, a 2P AMD EPYC™ 9754 system (2 x 128C) running 64 instances using 8 logical CPU threads per instance delivers ~2.90x the FFmpeg v4.4.2 raw to vp9 4K Tears of Steel encoding performance of a 2P Grace CPU Superchip system (2 x 72C) running 18 instances using 8 physical CPU cores per instance. Each FFmpeg instance transcoded a single input file with 4K resolution in raw video format on an NVMe drive into an output file with the VP9 codec on a separate NVMe drive. Multiple FFmpeg jobs were run concurrently on each system, and aggregate performance on each system was compared using the median total frames processed per hour across 3 runs. AMD System configuration: CPU: 2 x AMD EPYC 9754 (128C, 256 MB L3 per CPU) on AMD reference system; RAM: 24 x 128 GB DDR5-4800; Storage: 6 * 1.7 TB Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO; NIC: NetXtreme BCM5720 Gigabit Ethernet PCIe @ 1Gbe; BIOS: RTI1006C; BIOS Options: -default; OS: Ubuntu 22.04.1 LTS; Kernel: Linux 5.15.0-105-generic; OS Options: Default. NVIDIA System configuration: CPU: 2 x NVIDIA Grace CPU Superchip (72C, 228 MB L3 per socket) on production system; RAM: 2 x 240 GB LPDDR5_SDRAM-8532; Storage: 3 TB NVME SAMSUNG MZTL23T8HCLS-00A07 & 900 GB NVME SAMSUNG MZ1L2960HCJR-00A07; NIC: Ethernet Controller 10G X550T, 2Ports; BIOS: 1.1; BIOS Options: Default; OS: Ubuntu 22.04.4 LTS; Kernel: Linux 5.15.0-101-generic; OS Options: Default. Tears of Steel (CC) Blender Foundation | mango.blender.org. Results may vary based on factors including but not limited to BIOS and OS settings and versions, software versions and data used.

 

[vi] SP5-279: As of 07/12/2024, a 1P AMD EPYC™ 9754 system delivers a 2.50x SPECpower_ssj® 2008 overall ssj_ops/watt uplift versus a 2P NVIDIA Grace™ CPU Superchip system. Configurations: 1P 128-core EPYC 9754 (33,014 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2023q3/power_ssj2008-20230524-01270.html) versus 2P 72-core Nvidia Grace Superchip (13,218 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q3/power_ssj2008-20240515-01413.html. SPEC® and SPECpower_ssj® 2008 are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

 

About the Author
Marketing, business development and management professional for technology products and solutions businesses. Experienced in the analysis of markets and emerging technologies and defining solutions and enabling strategies to capitalize on them. Ability to build relationships across business and technical constituencies and to define technology solutions on a business basis. James is co-author of the book: Intel® Trusted Execution Technology for Servers: A Guide to More Secure Datacenters and contributed to the US National Institute for Standards and Technology (NIST) Interagency Report 7904 - Trusted Geolocation in the Cloud: A Proof Of Concept Implementation. He has been a speaker, panelist or presenter at a number of public venues, such as the International Forum on Cyber Security, SecureTech Canada, Intel Developer Forum, OpenStack Conference and Design Summit, InfoSec World 2011, vmworld, ENSA@Work, RSA in US and Europe, CSA Congress, HP Discover and other industry forums.