AMD Instinct™ accelerators have been gaining a lot of traction and publicity around its growing adoption across major HPC centers. This is highly evident from the recently published list of Top500 supercomputers in the world by Top500.org, where AMD Instinct accelerators power the fastest supercomputer in the world.
Driving much of the success of AMD Instinct accelerators is the ability to deliver outstanding performance at scale. The first exascale-class supercomputers for the Department of Energy to leverage AMD Instinct accelerators and AMD EPYC™ CPUs for two out of the three planned Exascale supercomputers in the United States reflect a high confidence in both the product capability and ability for AMD to deliver an exceptional performance and user experience for a broad set of their HPC users.
The Frontier supercomputer at Oak Ridge National Laboratories, powered by AMD Instinct MI250X accelerators and AMD EPYC CPUs, was the first supercomputer to officially pass the Exascale barrier and according to the latest Top500 list, was the fastest in the world. AMD Instinct MI250X accelerators also power the #3 Supercomputer, LUMI at CSC Finland, and the #10 Supercomputer in Adestra at CINES.1 For reference, AMD Instinct MI250X GPUs + EPYC CPUs delivers over 30% of all FLOPs on the June ’22 Top500 list.1 This strong preference for AMD Instinct GPUs by HPC users is driven by several factors. Key among them are:
AMD made a major design shift in GPU architecture starting with AMD Instinct MI100 products with a new focus on compute intensive use cases like HPC and AI/ML training. The need for HPC and AI/ML was shifting rapidly and AMD was the first GPU vendor to make the decision to focus on this trend with a dedicated GPU architecture. The result was the first compute focused AMD CDNA™ architecture that is optimized for computing to push the limits of flops per second.
The latest AMD CDNA™ 2 architecture builds on the tremendous core strengths of the original AMD CDNA architecture to deliver a leap forward in accelerator performance and usability while using a similar process technology. The AMD CDNA architecture is an excellent starting point for a computational platform. However, to deliver exascale performance the architecture was overhauled with enhancements from the compute units to the memory interface, with a particular emphasis on radically improving the communication interfaces for full system scalability.
The below image puts in stark contrast the benefits of the approach to building a dedicated architecture like AMD CDNA. The performance improvement of AMD Instinct GPUs over time and its performance in Flops vs the other vendor validates the reason for the strong interest from key HPC users.
Figure 1: Historical FP64 GPU performance over time (For illustration only).
The AMD ROCm software stack was built with three key principles. First, accelerated computing requires a platform that unifies processors and accelerators when it comes to system resources. While they play different roles for different workloads, they need to work together effectively and have equal access to resources such as memory. Second, a rich ecosystem of software libraries and tools should enable portable and performant code that can take advantage of new capabilities. Last, an open-source approach that empowers vendors, customers, and the entire community.
Having a strong hardware and software foundation provides AMD Instinct accelerators an ideal launchpad to accelerate HPC applications across several verticals. The AMD Infinity Hub is a growing collection of HPC and AI frameworks and HPC application containers across various domains like Life Science, Physics, Quantum Chemistry, and more. To showcase the performance using AMD Instinct accelerators, we selected a set of HPC applications and compared them against another GPU vendor. The performance was measured on a Gigabyte server with four AMD Instinct™ MI250 GPUs and referenced against the other data center accelerator vendor’s publicly released data from their benchmark site and AMD testing lab results where published benchmarks were not available.
We believe results should reflect actual delivered performance, replicable by third parties for validation and should be available for users for real world use cases. Where possible, AMD provides readily deployable HPC application and benchmark containers on the AMD Infinity Hub for replication testing.4
All performance results for AMD Instinct MI250 were calculated using Geomean calculations across multiple datasets and compared to published results from the other vendor except performance results for OpenMM and HPCG, which were performed in the AMD testing labs using a Geomean basis across multiple datasets.
In test runs using one GPU, AMD Instinct MI250 outperformed an A100 80GB SXM across all applications tested and delivered ~1.3x solution on AMBER and ~1.8x higher performance on OpenMM.5,6,7,8,9
Figure 2: 1x AMD Instinct™ MI250 GPU Performance.5,6,7,8,9
In the multi-GPU instances, AMD Instinct MI250 showcases the strengths of the Instinct GPUs compute capability as well as the Peer-2-Peer interconnect performance with the high speed AMD Infinity Fabric™ technology providing up to 400 GB/S of total theoretical I/O bandwidth per MI250 in this test platform10 to deliver up to 1.7x higher performance on OpenMM and AMBER following up at ~1.4x.6,8
Figure 3: 4x AMD Instinct™ MI250 GPU Performance.5,6,7,8,9
The individual modules within these major applications highlight the massive performance advantage AMD Instinct MI250 has over its nearest GPU competitor. For example, running OpenMM amoebapme with 4x MI250 GPUs provides up to 2.1x higher performance than A100 seen below in Figure 5.6
Figure 4: LAMMPS Perf9 Figure 5: OpenMM Perf8 Figure 6: AMBER Perf6
The era of Exascale computing is here and the requirements for HPC has taken a massive leap forward. AMD Instinct accelerators, along with EPYC CPUs and the ROCm open software platform, are the first accelerated solution to power an Exascale Supercomputer with the Frontier system at ORNL unlocking a new era of computing capabilities for HPC users. The AMD Instinct MI200 series exascale-class products and the ROCm SW stack are now readily available for customers and the entire HPC & AI community. The application performance delivered by AMD Instinct GPUs highlights the growing adoption of AMD GPUs by a broad set of HPC users and highlights what a dedicated compute designed GPU architecture and an open platform can deliver. AMD Instinct MI250 provides outstanding theoretical peak performance, performance/Watt and HPC application performance on several key modules across these HPC applications.
We encourage users to verify these results by running the tests for themselves. AMD benchmark code can be found on the AMD Infinity Hub.4
For more information about AMD Instinct™ MI250 GPU performance, please click here.
Making the ROCm platform even easier to adopt
For ROCm users and developers, AMD is continually looking for ways to make ROCm easier to use, easier to deploy on systems and to provide learning tools and technical documents to support those efforts.
Mahesh Balasubramanian, Guy Ludden, & Bryce Mackin are in the AMD Instinct™ GPU product Marketing Group for AMD. Their postings are their own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.