Llama 3.2 and AMD: Optimal Performance from Cloud to Edge and AI PCs

Ramine_Roane · ‎09-25-2024

AMD welcomes the latest Llama 3.2 release from Meta. Llama 3.2 is designed to make developers more productive, helping them build the next generation of experiences and saving development time with a greater focus on data privacy and responsible AI innovation. The emphasis on openness and customization has driven 10x growth this year in Llama model downloads compared to last year, making it a leading choice for developers seeking efficient, easy-to-use AI tools. Llama 3.2 allows developers to work with advanced models, leveraging memory capabilities to enable processing text and visual data at once. These open-source foundation models drive the speed and breadth of AI innovation, enabling broad access to the latest advancements, giving developers more features, control, and safety.

AMD has a long-standing collaborative engagement with Meta. We continue to optimize AI performance across our platforms for Meta models, now including Llama 3.2. Our collaboration with Meta enables Llama 3.2 developers to build new agentic applications and personalized AI experiences from cloud to edge and AI PCs, with excellent performance and power efficiency.

AMD Instinct™ MI300X GPU Accelerators and Llama 3.2

AMD Instinct™ MI300X accelerators are transforming the landscape of multimodal AI models, such as Llama 3.2, which includes 11B and 90B parameter models. These require immense computational resources and memory bandwidth to process text and visual data.

As proven in previous demonstrations with Llama 3.1 launch, AMD Instinct™ accelerators deliver unmatched memory capability ¹, enabling a single server with 8 MI300X GPUs to fit the largest open-source available today, with 405B parameters in FP16 datatype²—something no other 8x GPU platform can achieve. With the launch of Llama 3.2, AMD Instinct™ MI300X accelerators are equipped to support current and future variants of these multimodal models with great memory efficiency. This industry-leading memory capacity simplifies infrastructure management by reducing the complexity of distributing memory across multiple devices, enabling fast training, real-time inference, and seamless handling of large datasets across modalities like text and images without compromise or network overhead of distributing across several servers.

For organizations, this can mean significant cost savings, enhanced performance efficiency, and streamlined operations, all powered by advanced memory capabilities of the AMD Instinct™ MI300X platform.

Meta has also leveraged AMD ROCm™ software and AMD Instinct™ MI300X accelerators across key stages of Llama 3.2 development, further strengthening a long-standing collaboration with AMD and commitment to an open software approach for AI. Scalable infrastructure from AMD enables developers to build powerful visual reasoning and understanding applications, delivering open-model flexibility with the performance to rival closed models.

With the launch of Llama 3.2 generation of models, developers now have Day-0 support for the latest frontier models from Meta on the latest generation of AMD Instinct™ MI300X GPUs providing a broader choice of GPU hardware and an open software stack ROCm™ for further application development.

AMD EPYC™ CPUs and Llama 3.2

Many AI workloads are run on CPUs today—either on the CPU or in combination with GPUs. AMD EPYC™ processors provide the performance and energy efficiency to service the innovative models developed by Meta—including the new Llama 3.2. Whilst much of the recent focus has been on LLM (large language model) innovations with huge data sets, the emergence of SLMs (small language models) is important to note. These smaller models allow for customization and tailoring to specific enterprise datasets while also helping mitigate risks around security and privacy of sensitive data—and require much less compute infrastructure. These models are designed to be agile, efficient, and performant, making them suitable and right-sized for a broad range of enterprise and industry-specific applications.

The Llama 3.2 release of new features with multimodal (including image reasoning models) models and smaller model options (1 billion and 3 billion parameter) are reflective of many mass market enterprise deployment scenarios—especially for customers exploring CPU-based AI implementations.

With Llama 3.2 models, our leadership AMD EPYC™ processors provide compelling performance and efficiency for enterprises when consolidating their data center infrastructure, using their server compute infrastructure while still offering the ability to expand and accommodate GPU- or CPU-based deployments for larger AI models, as needed, using AMD EPYC™ CPUs and AMD Instinct™ GPUs.

Llama 3.2 with AMD AI PCs powered by Radeon™ and Ryzen™ AI

For users looking to use Llama 3.2 locally on their own PCs, AMD has worked closely with Meta on optimizing the latest models for AMD Ryzen™ AI PCs and AMD Radeon™ graphics cards. AMD AI PCs equipped with DirectML supported AMD GPUs can also run Llama 3.2 locally on devices accelerated via DirectML AI frameworks optimized for AMD. Additionally, we ensured Day-0 support for Llama 3.2 (1B and 3B) on the NPU of Ryzen™ AI, further enhancing local performance and efficiency. Windows users will soon be able to experience multimodal Llama 3.2 in a consumer-friendly package through AMD partner LMStudio.

The latest AMD Radeon™ graphics cards, the AMD Radeon™ PRO W7900 Series with up to 48GB, and the AMD Radeon™ RX 7900 Series with up to 24GB, feature up to 192 AI accelerators capable of running cutting-edge models such as Llama 3.2-11B Vision. Leveraging the same AMD ROCm™ 6.2 optimized framework from the collaboration between Meta and AMD, users can try out the latest models today on their PCs equipped with these cards³.

AMD and Meta: Advancement through Collaboration

In conclusion, AMD is advancing generative AI innovation in collaboration with Meta, helping ensure developers are well-equipped to handle each new release seamlessly with Day-0 support across our broad AI portfolio. The integration of Llama 3.2 with AMD Instinct™ MI300X GPUs, AMD EPYC™ CPUs, AMD Ryzen™ AI, AMD Radeon™ GPUs, and AMD ROCm™ software gives users flexibility of solution choice to fuel their innovations from cloud to edge to AI PCs.

©2024, Advanced Micro Devices, Inc. All rights reserved. AMD, and the AMD Arrow logo, AMD EPYC™, AMD Instinct™ MI300X, AMD Radeon™ GPUs, AMD ROCm™, AMD Ryzen™ AI, and combinations hereof are trademarks of Advanced Micro Devices, Inc.

Claims

¹MI300-05A: Calculations conducted by AMD Performance Labs as of November 17, 2023, for the AMD Instinct™ MI300X OAM accelerator 750W (192 GB HBM3) designed with AMD CDNA™ 3 5nm FinFet process technology resulted in 192 GB HBM3 memory capacity and 5.325 TFLOPS peak theoretical memory bandwidth performance. MI300X memory bus interface is 8,192 and memory data rate is 5.2 Gbps for total peak memory bandwidth of 5.325 TB/s (8,192 bits memory bus interface * 5.2 Gbps memory data rate/8).

The highest published results on the NVidia Hopper H200 (141GB) SXM GPU accelerator resulted in 141GB HBM3e memory capacity and 4.8 TB/s GPU memory bandwidth performance.
https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446

The highest published results on the NVidia Hopper H100 (80GB) SXM5 GPU accelerator resulted in 80GB HBM3 memory capacity and 3.35 TB/s GPU memory bandwidth performance.

²https://artificialanalysis.ai/

Endnotes:

³ For a full list of Radeon™ parts supported by ROCm™ software as of 5/1/2024, go to https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html. GD-241

CAUTIONARY STATEMENT

This blog contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the expected features and benefits of AMD EPYC™ processors; AMD Ryzen™ AI PCs; AMD advancing generative AI innovation, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this blog are based on current beliefs, assumptions, and expectations, speak only as of the date of this blog and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this blog, except as may be required by law.