The world welcomes the AMD Instinct™ MI210 accelerator

guy_ludden · ‎03-22-2022

Last fall we announced that the AMD Instinct™ MI200 series accelerators were bringing the Oak Ridge National Laboratory’s Frontier system into the exascale era. Ever since, the world has been waiting to use this technology to accelerate mainstream HPC and AI/ML workloads in enterprise data centers.

The wait is over because today, the AMD Instinct MI210 accelerator is launching to benefit the full spectrum of data center computing—using the same technologies that power the many of the world’s fastest supercomputers—but in a PCIe® form-factor package bringing industry performance leadership in accelerated compute for double precision (FP64) to mainstream innovators in HPC and AI.1

Architectural Prowess

The AMD CDNA™ 2 architecture is our purpose-built, optimized architecture designed to do one thing very well: accelerate compute-intensive HPC and AI/ML workloads. AMD CDNA 2 includes 2nd generation AMD Matrix Cores bringing new FP64 Matrix capabilities, optimized instructions, and more memory capacity with faster memory bandwidth of previous gen AMD Instinct GPU compute products to feed data-hungry workloads.2

Our 3rd Gen AMD Infinity Fabric™ technology brings advanced platform connectivity and scalability enabling fully connected dual and quad P2P GPU hives through three Infinity Fabric links delivering up to 300 GB/s (Dual) and 600 GB/s (Quad) of total aggregate peak P2P theoretical I/O bandwidth for lightning-fast P2P communications and data sharing.3 Finally, the AMD ROCm™ 5 open software platform enables your HPC and AI/ML codes to tap the full capabilities of the MI210 and the rest of your GPU accelerators with a single code base. The AMD Infinity Hub gives you ready-to-run containers preconfigured with many popular HPC and AI/ML applications. Putting the MI210 to work in your data center couldn’t be easier.

Word’s Fastest PCIe® Accelerator for HPC

How does it perform, you ask? The AMD Instinct MI210 is the world’s fastest double-precision (FP64) data center PCIe accelerator with up to 22.6 teraflops FP64 and 45.3 teraflops FP64 Matrix peak theoretical performance for HPC delivering a 2.3x FP64 performance boost over NVIDIA Ampere A100 GPUs.1 MI210 also brings up to 181 teraflops FP16 and BF16 performance for machine learning training. And it hosts 64 GB of HBM2e memory with 33% more bandwidth at 1.6 TB/s memory bandwidth of previous Gen AMD Instinct GPU compute products to handle the most demanding workloads.2

So, how does this translate to real-world applications? Visit the AMD Instinct benchmark page to see how AMD Instinct accelerators stack-up against the competition. You may be surprised.

Broad Adoption, Software Ready

Now you want to know, how do you get access to it? We work with a broad range of technology partners to help make sure that you get the best out of your investment.

First, start with visiting our extensive partner server solutions HPC and AI catalog pages to choose a qualified platform from

your favorite server vendor. Next, check out the ROCm 5 open software platform that will help to bring your HPC codes alive. Or make it even easier by visiting the AMD Infinity Hub and download optimized HPC codes encapsulated in containers, ready to run. If you want to test drive the latest AMD hardware and software before you buy, visit the AMD Accelerator Cloud (AAC) to remotely access and gain hands-on experience with our next-gen high performance technologies.

Additional Resources:

Learn more about the AMD Instinct™ MI200 series accelerators

Learn more about the AMD Instinct™ MI300 Series accelerators

Download the full AMD Instinct™ Accelerators Server Solutions Catalog

To see the full list of available application and frameworks containers, visit the AMD Infinity Hub.

Learn more about the AMD ROCm™ open software platform

Access the latest ROCm drivers, support docs, and ROCm education materials on the AMD ROCm™ Develop Hub.

Guy Ludden is Sr. Product Marketing Mgr. for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied

Endnotes:

Calculations conducted by AMD Performance Labs as of Jan 14, 2022, for the AMD Instinct™ MI210 (64GB HBM2e PCIe® card) accelerator at 1,700 MHz peak boost engine clock resulted in 45.3 TFLOPS peak theoretical double precision (FP64 Matrix) and 22.6 TFLOPS peak theoretical double precision (FP64). Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64). Published results on the NVidia Ampere A100 (80GB) GPU accelerator, boost engine clock of 1410 MHz, resulted in 19.5 TFLOPS peak double precision tensor cores (FP64 Tensor Core)and 9.7 TFLOPS peak double precision (FP64) https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper..., page 15, Table 1. MI200-41
Calculations conducted by AMD Performance Labs as of Jan 27, 2022, for the AMD Instinct™ MI210 (64GB HBM2e) accelerator (PCIe®) designed with AMD CDNA™ 2 architecture 6nm FinFet process technology at 1,600 MHz peak memory clock resulted in 64 GB HBM2e memory capacity and 1.6384 TFLOPS peak theoretical memory bandwidth performance. MI210 memory bus interface is 4,096 bits and memory data rate is 3.20 Gbps for total memory bandwidth of 1.6384 TB/s ((3.20 Gbps*(4,096 bits))/8). Calculations conducted by AMD Performance Labs as of Sep 18, 2020, for the AMD Instinct™ MI100 (32GB HBM2) accelerator (PCIe®) designed with AMD CDNA™ architecture 7nm FinFet process technology at 1,502 MHz peak clock resulted in 32 GB HBM2 memory capacity and 1.2288 TFLOPS peak theoretical memory bandwidth performance. MI210 memory bus interface is 4,096 bits and memory data rate is 2.40 Gbps for total memory bandwidth of 1.2288 TB/s ((2.40 Gbps*(4,096 bits))/8). MI200-42
MI200-43 Calculations as of JAN 27th, 2022. AMD Instinct™ MI210 built on AMD CDNA™ 2 technology accelerators support PCIe® Gen4 providing up to 64 GB/s peak theoretical data bandwidth from CPU to GPU per card. AMD Instinct™ MI210 CDNA 2 technology-based accelerators include three Infinity Fabric™ links providing up to 300 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) bandwidth performance per GPU card. Combined with PCIe Gen4 support, this provides an aggregate GPU card I/O peak bandwidth of up to 364 GB/s. Dual-GPU hives: One dual-GPU hive provides up to 300 GB/s peak theoretical P2P performance. Four-GPU hives: One four-GPU hive provide up to 600 GB/s peak theoretical P2P performance. Dual four GPU hives in a server provide up to 1.2 TB/s total peak theoretical direct P2P performance per server. AMD Infinity Fabric link technology not enabled: One four-GPU hive provide up to 256 GB/s peak theoretical P2P performance with PCIe® 4.0. AMD Instinct™ MI100 built on AMD CDNA technology accelerators support PCIe® Gen4 providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card. AMD Instinct™ MI100 CDNA technology-based accelerators include three Infinity Fabric™ links providing up to 276 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) bandwidth performance per GPU card. Combined with PCIe Gen4 support, this provides an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. One four-GPU hive provides up to 552 GB/s peak theoretical P2P performance. Dual four-GPU hives in a server provide up to 1.1 TB/s total peak theoretical direct P2P performance per server. AMD Infinity Fabric link technology not enabled: One four-GPU hive provides up to 256 GB/s peak theoretical P2P performance with PCIe® 4.0. Server manufacturers may vary configuration offerings yielding different results. MI200-43