
A deep technical overview of the new MoE Align & Sort algorithm. By fully enabling concurrent multiple blocks execution with arbitrary expert numbers, and with aggressive usage of shared memory and registers, the MoE Align & Sort significant performance gains on AMD hardware, providing up to a 10x acceleration on AMD InstinctTM MI100 GPUs and 7x on AMD Instinct MI300X/MI300A GPUs.

Customers evaluating AI infrastructure today rely on a combination of industry-standard benchmarks and real-world model performance metrics—such as those from Llama 3.1 405B, DeepSeek-R1, and other leading open-source models—to guide their GPU purchase decisions.
At AMD, we believe that delivering value across both dimensions is essential to driving broader AI adoption and real-world deployment at scale. That’s why we take a holistic approach—optimizing performance for rigorous industry benchmarks like MLperf while also enabling Day 0 support and rapid tuning for the models most widely used in production by our customers. This strategy helps ensure AMD Instinct™ GPUs deliver not only strong, standardized performance, but also high-throughput, scalable AI inferencing across the latest generative and language models used by customers.
In this blog, we explore how AMD’s continued investment in benchmarking, open model enablement, software and ecosystem tools helps unlock greater value for customers—from MLPerf Inference 5.0 results to Llama 3.1 405B and DeepSeek-R1 performance, ROCm software advances, and beyond.
more
Nathan Nadarajah, Senior Fellow and Security Architect at AMD, recently sat down with me to answer some questions about GPU security. In his two decades with the company, Nadarajah has worked on GPU drivers, GPU firmware and security firmware, and his expertise spans both the use of GPUs in enterprise data centers and in consumer gaming workstations.
more
Cloud computing providers and leading technology companies are investing in cutting-edge AI chips to power the next generation of innovation.

Today, we continue to celebrate the dedication of the El Capitan supercomputer with Lawrence Livermore National Laboratory (LLNL), in collaboration with the National Nuclear Security Administration (NNSA) and Hewlett Packard Enterprise (HPE).
more
The AI era is here, and it's hungry—hungry for performance, scalability, and efficiency. Whether you're building next-gen data centers, fine-tuning your AI/ML workloads, or crafting cutting-edge HPC solutions, one thing is clear: the right ingredients matter. This blog will guide you through the AMD recommended ingredients, the secret sauce, and the cooking techniques needed to create an AI/HPC infrastructure that’s as efficient as it is powerful. Let’s get cooking.
more