4th Gen AMD EPYC™ Processors Are the Natural Choice for Artificial Intelligence Workloads

raghu_nambiar · ‎08-15-2024

Artificial Intelligence (AI) has evolved from a distant concept into a crucial component of our everyday lives, impacting everything from personalized streaming services to cutting-edge healthcare diagnostics. Its broad adoption is revolutionizing industries, increasing productivity, and improving user experiences across multiple sectors. AI streamlines complex processes and transforms the way we live, work, and interact with technology.

Here are some of the key applications of AI in the enterprise sector.

Recommendation Engines: AI and machine learning analyze user behavior and preferences to deliver personalized suggestions for e-commerce, content streaming, and social media to enhance user engagement by recommending products, services, or content that aligns with user interests.
Customer Service Automation: AI-driven chatbots and virtual assistants manage customer inquiries, provide instant responses, and resolve issues around the clock. This boosts customer satisfaction, accelerates response times, and reduces operational costs.
Predictive Maintenance: AI and machine learning monitor mechanical system data in real time to detect anomalies and predict unscheduled maintenance before a failure occurs. This approach minimizes downtime, lowers maintenance expenses, and extends equipment lifespan. This concept is highly utilized in the context of digital twins.
Fraud Detection and Prevention: AI algorithms monitor transactions to detect unusual patterns or anomalies that could indicate fraudulent activity. This enhances security, protects financial assets, and mitigates fraud risk.
Targeted Marketing: AI examines customer data to craft marketing campaigns and personalized recommendations. This approach improves engagement, increases conversion rates, and optimizes marketing budgets.
Supply Chain Optimization: AI forecasts demand, optimizes inventory levels, and streamlines complex global logistics and supply chain operations. This reduces operational costs, boosts capital and inventory efficiency, and optimizes product delivery.
Document Understanding with LLMs: Large Language Models (LLMs) process and extract information from unstructured text documents, such as contracts and reports. This improves data extraction efficiency, automates document review, and supports informed decision-making with insights from extensive text data.

The infrastructure needed to run these AI-based business augmentations is not always clear. This blog presents a series of evidence-based comparisons between AMD EPYC™ processors and their competitors, focusing on critical AI workloads.

AMD and AI

AMD is strategically positioned with a broad range of workload-optimized compute engines and technologies that comprise the foundation of efficient AI platforms of any size. The recently launched AMD Ryzen™ 7040 Series Processors feature the world’s first dedicated AI engine in an x86 processor for both consumer and commercial PCs. Various industries rely on AMD Alveo™ accelerators, AMD Versal™ adaptive SoCs, and leading FPGAs for AI-based image detection and advanced automotive driver-assist and safety functions—including NASA’s Mars rovers. The AMD Instinct™ MI300X accelerators, with 153 billion transistors, is designed to deliver leadership performance for Generative AI workloads and HPC applications.

AMD EPYC processors hold hundreds of world records for performance and efficiency that demonstrate their capacity to deliver outstanding performance and efficiency across a wide range of enterprise workloads that include modern data center AI applications. They are ideal for businesses seeking to integrate AI into various applications while maintaining a unified x86-based infrastructure for tasks such as databases, big data, and natural language processing, including chatbots and other AI functions. For more demanding training and inference requirements, the MI series GPUs offer the extra power and performance needed.

End-to-End AI

The many use cases described above indicate that “AI” implementations can take many forms and involve a wide number of processes. Training and inferencing are key parts of the AI pipeline that require high amounts of computing power for data cleaning and transformation, preparing and labeling data, scoring and serving, and turning results into actionable business insights. The diversity of implementation and processes and multiple available architectural options make it helpful to have testing tools to characterize performance.

The Transaction Processing Performance Council (TPC) TPCx-AI benchmark evaluates the entire AI pipeline. It consists of a detailed dataset structure for a retail data center that incorporates important business data such as customer details, orders, financials, and product information. It spans a variety of enterprise use cases like customer segmentation, conversation transcription, sales forecasting, spam detection, price prediction, classification, and fraud detection. The results of this benchmark are also audited and published, thereby providing significant insight into relative performance and efficiency expectations.

AMD tested TPCx-AI performance using a 30 GB dataset (Scale Factor 30). A 2P system powered by 96-core AMD EPYC 9654 processors delivered a ~1.65X uplift over a comparable 2P system powered by 64-core Intel® Xeon® Platinum 8592+ processors. On a per-core basis, the AMD EPYC 9654 system delivers a ~1.10x uplift over the Intel Xeon Platinum 8592+ system. Please see Leadership End-to-End AI Performance to learn more.

Figure 1: End-to-end AI performance uplifts.

Gradient Boosting

Gradient boosting is a machine learning technique used for both regression and classification tasks. XGBoost (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosting algorithm that handles large datasets and supports parallel processing that enables rapid training. It also gracefully handles missing values, which allows processing incomplete real-world data without needing significant pre-processing. This performance and versatility make XGBoost a good choice for a variety of applications—examples of which are embodied in a number of use case models and data sets included in the repository. These cases serve well as test parameters.

A 2P AMD EPYC 9654 system achieved a ~1.38x airline inference uplift versus a 2P Intel Xeon Platinum 8592+ system. Please see Leadership XGBoost Performance to learn more.

Figure 2: XGBoost performance uplifts

Similarity Search

Similarity search is a fundamental information retrieval and data mining task that involves finding objects that are most similar to a given query object. The goal is to identify items in a dataset that closely resemble a reference item based on some defined similarity measure. The Facebook AI Similarity Search (FAISS) library enables fast, scalable searches for similar multimedia files. It transcends legacy databases by allowing k-means nearest-neighbor (KNN) searches across large datasets with optimal memory-speed-accuracy balancing.

AMD tested Faiss.Index PQ.ST_PQ (Product Quantization with Subspace Tree) using the sift1m Scale-Invariant Feature Transform (SIFT) image descriptor dataset. A 2P AMD EPYC 9654 system achieved a ~2.04x inference uplift versus a 2P Intel Xeon Platinum 8592+ system. Please see Broad Spectrum AI Workload Performance Leadership to learn more.

Figure 3: FAISS performance uplift

Multi-Task Learning

Multi-task learning (MTL) is a machine learning paradigm that trains a model to perform multiple simultaneous tasks. MTL improves overall performance by eschewing training independent models for each task in favor of leveraging shared knowledge across tasks. Multi-gate Mixture-of-Experts (MMoE) is an advanced neural network architecture designed for MTL scenarios that simultaneously learns multiple related tasks. MMoE builds on the concept of mixture-of-experts models by introducing multiple gating mechanisms to better handle diverse and interconnected tasks.

AMD tested a Taobao dataset containing 8M records. A 2P AMD EPYC 9654 system achieved a ~1.45x inference uplift versus a 2P Intel Xeon Platinum system. Please see Broad Spectrum AI Workload Performance Leadership to learn more.

Figure 4: MMoE performance uplift

Multitude of Decision Trees

Random Forest is a machine learning algorithm that creates a collection of decision trees that are each trained on a random subset of data and features. It then combines their predictions to improve accuracy and generalization, making it effective for both classification and regression tasks. Its ability to handle large datasets, avoid overfitting, and provide insights into feature importance makes it widely used in domains spanning finance, healthcare, and beyond.

AMD performed random forest testing using an airline dataset with 1M rows of data to predict the likelihood of flight delays. The 2P AMD EPYC 9654 system achieved a ~1.36x inference uplift versus the 2P Intel Xeon Platinum 8592+ system —a testament to the high core counts and high-performance “Zen 4” cores found in 4^th Gen AMD EPYC processors. Please see Broad Spectrum AI Workload Performance Leadership to learn more.

Figure 5: Random Forest performance uplift

Large Language Models

The rapid evolution of Large Language Models (LLMs) is spurring equally rapid adoption for in-house applications that include chatbots, summarization, and extracting information from unstructured text documents like contracts and reports. 4th Gen AMD EPYC processors offer the performance and cost-effectiveness needed for smaller, enterprise-class models (on the order of 10-13 billion parameters) and inference tasks. Again, customers with even larger models and real-time training needs can leverage the power of AMD Instinct MI300 series or other 3^rd party AI accelerators. The “Llama” family of models published by Meta is a leading example of the LLM category.

AMD tested Llama2-7B and Llama3-7B on both a 2P AMD EPYC 9654 system and a 2P Intel Xeon Platinum 8592+ system. The 2P AMD EPYC 9654 system achieved a ~1.21x inference uplift versus the 2P Intel Xeon Platinum 8592+ system. Please see Leadership Natural Language AI Performance: Outperforming 5th Gen Intel® Xeon® with AMX to learn more.

Figure 6: Llama2-7B multi-instance uplift

Recommendation Engine

In the previous sections of the blog, I compared AMD EPYC processors with Intel Xeon processors. Now, let's explore a comparison of AMD and Intel-based instances on AWS to uncover their respective strengths and performance characteristics.

The Deep Learning Recommendation Model (DLRM) is a framework that enhances recommendation systems using deep learning techniques to deliver precise personalized recommendations by integrating user and item features. This example showcases using DLRM with Amazon EC2 instances.

Amazon EC2 HPC7a.96xlarge instances powered by 4^th Gen AMD EPYC processors provide a ~1.44x performance uplift, a ~1.93x performance/$ uplift, and ~1.48x cloud OpEx savings versus Amazon m7i.48xlarge instances powered by Intel Xeon “Sapphire Rapids” processors on Deep Learning Recommendation Engine (DLRMv2) at Int8 precision.[1]

Figure 7: Amazon EC2 HPC7a.96xlarge DLRMv2 performance, perf/$, and CapEx uplifts

Conclusion

AI advancements are changing how we live and work. AMD is at the forefront with a broad range of compute engines and technologies designed for efficient AI platforms, from edge to datacenters. This includes the AMD Ryzen™ 7040 Series Processors, AMD Versal™ adaptive SoCs, AMD EPYC™ processors, and AMD Instinct™ MI accelerators. The 4th Gen AMD EPYC processors stand out with their high core densities, extensive memory bandwidth, and exceptional efficiency, making them highly effective for enterprise AI tasks and surpassing the 5th Gen Intel Xeon Platinum processors. As AI technology advances, selecting the right hardware becomes crucial for achieving optimal performance. AMD stands out as the leading choice, offering superior solutions that enhance AI capabilities across the entire technology stack, from edge to datacenters.

Endnote

SP5C-066: AWS m7a. 48xl average scores and Cloud OpEx savings comparison to M7i.48xl running Deep Learning Recommendation Model (dlrm-v2.99) at Int8 precision with OneDNN library and IPEX extension with batch size = 2000 using on-demand pricing US-East (Ohio) Linux® as of 6/11/2024 of M7i. 48xl: $9.6768 / hr. HPC7a. 96xl: $7.2/ hr. AWS pricing: https://aws.amazon.com/ec2/pricing/on-demand/. Cloud performance results presented are based on the test date in the configuration. Results may vary due to changes to the underlying configuration, and other conditions such as the placement of the VM and its resources, optimizations by the cloud service provider, accessed cloud regions, co-tenants, and the types of other workloads exercised at the same time on the system.