cancel
Showing results for 
Search instead for 
Did you mean: 

Advancing HPC to the next level of sustainability with AMD Instinct accelerators.

wmyrhang
Staff
Staff
0 0 1,600

High performance computing (HPC) has become an essential part of our modern world performing complex simulations and calculations that are essential to scientific research, engineering, security and other fields. However, as the demand for HPC has grown, often in supercomputers and large datacenters, so has the concern about its environmental impact. In recent years, there has been a growing focus on data center sustainability given the implications on total cost of ownership and climate concerns. In this blog post, we'll explore some of the key issues around data center energy efficiency and discuss some of the strategies from AMD to help reduce environmental impacts from HPC.

One of the biggest challenges facing data centers is their energy consumption as you scale up to exascale and beyond. The server nodes consume vast amounts of energy with HPC requiring even more energy with all their CPUs and accelerators, which makes improving their efficiency an important priority. As the demand for HPC compute continues to increase we are running into a scenario where energy consumption is a gating factor. This energy consumption not only puts a strain on the environment but also on the bottom line of data center operators as the industry demands more computational performance. As a result, massive performance per watt improvements are needed as the industry pushes to the next milestone.

Since AMD is a designer of cutting-edge server CPUs and GPUs, we recognize our important role in addressing these critical priorities. We are focused on accelerating server energy efficiency, enabling lower data center total cost of ownership (TCO) and delivering high-performance computing (HPC) to help tackle some of the world’s toughest challenges. Back in September 2021, AMD announced an ambitious goal to increase energy efficiency of processors running.... Accomplishing this goal will require AMD to increase the energy efficiency of a compute node at a rate that is more than 2.5x faster than the aggregate industry-wide improvement made during the last five years.

AMD Instinct™ accelerators enables energy efficient HPC and AI by offering exceptional performance per watt at the device and system level therefore improving the energy efficiency of computation. To meet the AMD HPC and AI energy efficiency goals requires the next level of thinking around architecture, memory and interconnects that combine for accelerated system level improvements. For our AMD Instinct MI200 series accelerators we looked at it from a multifaceted approach which we’ll explain in more detail below on some of the key technologies to deliver leadership performance per watt and efficiency.

Architecture Technology – AMD CDNA™ 2 architecture in the MI200 Series represents a major leap forward compared to the prior generation by enhancing the Matrix Core technology for HPC and AI, driving computational capabilities for double-precision floating-point data and a variety of matrix-multiply primitives. One particular emphasis was on scientific computing with FP64 Matrix and Vector data to enable exascale levels of performance when scaled up in large systems like the Oak Ridge National Labs Frontier supercomputer with 1.1 exaflops of performance. These improvements result in a corresponding 4X improvement in both FP64 vector TFLOP/s and 2.5X FP64 TFLOPS/watt compared to the prior MI100 generation of CDNA architecture resulting in a performance per watt improvement (1).

 AMD Instinct Generational peak FP64 FLOPs.png

Packaging Technology – Chiplets and advanced packaging technology are a huge lever in improving performance and overall efficiency. They allow you to use different technologies for different functions, combining multiple accelerator dies, and bringing accelerators closer to memory etc. The denser you can make interconnect density the more efficient the solution can become and helps lower the costly data transfer energy consumed.

  • MCM - world’s first multichip GPU, designed to maximize compute and data throughput in a single package. The MI250 and MI250X use two AMD CDNA 2 Graphic Core Dies (GCD) in a single package to deliver 58 billion transistors in a highly condensed package with 1.8X more cores and 2.6X higher memory bandwidth vs. AMD previous generation accelerators (2). The two GCDs are tied together by a high-speed interface for chip-to-chip communication.

AMD Instinct MI250 MCM Image.png

Communication Technology – Communication is key when it comes to processing large amounts of data when developing high performance computers. The ability to move data efficiently between the processor and the outside world is also critical to system performance as you scale up and out. Bringing silicon chips closer together physically and electrically enables dramatic reductions in the energy of communication while at the same time provides higher throughput potential. AMD Infinity Architecture is our high-speed communication highway between CPU and GPU and GPU to GPU and we will discuss two areas in our Accelerators that improve communication efficiency.

  • Chip to Chip interconnect - The in-package AMD Infinity Fabric™ interface is one of the key innovations in the AMD CDNA 2 family, connecting the 2 GCDs within the MI250 or MI250X. It takes advantage of the extremely short distances between the GCDs within the package to operate at 25 Gbps and at extremely low power, delivering a theoretical maximum bi-directional bandwidth of up to 400 GB/s between the GCDs.

AMD Instinct MI200 Series GCD to GCD Interconnect.png

  • Infinity Architecture –The latest AMD Instinct product uses our 3rd generation of the Infinity Fabric which offers significant improvement over prior generation. The MI200 series offers up to 8 external Infinity Fabric™ links for GPU P2P or I/O on the AMD Instinct™ MI250 (or MI250X) accelerators delivering up to 800 GB/s of total theoretical bandwidth providing up to 235% the GPU P2P (or I/O) theoretical bandwidth performance of the previous generation products (3).

AMD Instinct MI200 Infinity Fabric Connection Example.png

CONCLUSION
Today AMD powers some of the most efficient supercomputers with our current EPYC and Instinct processors and accelerator. The Green500 is the industry ranking of supercomputers by energy efficiency twice each year by measuring performance per watt. AMD maintains the #2 through #7 spot on the most current June 2023 Green500 list which is a testament to AMD CPU and GPU technology in not only delivering some of the most powerful supercomputers but also some of the most energy efficient on the list.

To drive to that future compute milestone of Zetascale, requires the next level of thinking around architecture, memory and interconnects that combine for accelerated system level improvements. AMD has taken the first steps in combining the key pieces into a new accelerator that includes the best of AMD EPYC™ CPUs and AMD Instinct™ accelerators targeting even greater generational efficiency and performance gains than the prior MI250 design. This new AMD Instinct accelerator called the MI300, will be the world’s first integrated data center APU combining CPU + GPU + Shared HBM and deliver a breakthrough architecture to power future exascale AI and HPC supercomputers. It’s that monolithic integration on MI300 that leverages all of the approaches just discussed to achieve even greater generational efficiency gains than the prior MI250 design.

In conclusion, improving computer energy efficiency long term is important for reducing operating costs and advancing sustainability goals for high-performance computers, supercomputers, and data centers. The AMD Instinct group aspires to address the performance per watt at the device and system level therefore improving the efficiency of computation and advancing data center sustainability for HPC and AI.

 

CAUTIONARY STATEMENT:
This blog contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the expected features and benefits of AMD Instinct™ MI300 accelerator; and AMD’s 30x by 2025 energy efficiency goal, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this blog are based on current beliefs, assumptions and expectations, speak only as of the date of this blog and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings, including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q. AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this blog, except as may be required by law.

END NOTES:
1. MI200-01: World’s fastest data center GPU is the AMD Instinct™ MI250X. Calculations conducted by AMD Performance Labs as of Sep 15, 2021, for the AMD Instinct™ MI250X (128GB HBM2e OAM module) accelerator at 1,700 MHz peak boost engine clock resulted in 95.7 TFLOPS peak theoretical double precision (FP64 Matrix), 47.9 TFLOPS peak theoretical double precision (FP64), 95.7 TFLOPS peak theoretical single precision matrix (FP32 Matrix), 47.9 TFLOPS peak theoretical single precision (FP32), 383.0 TFLOPS peak theoretical half precision (FP16), and 383.0 TFLOPS peak theoretical Bfloat16 format precision (BF16) floating-point performance. Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak theoretical double precision (FP64), 46.1 TFLOPS peak theoretical single precision matrix (FP32), 23.1 TFLOPS peak theoretical single precision (FP32), 184.6 TFLOPS peak theoretical half precision (FP16) floating-point performance. MI200-39: Calculations conducted by AMD Performance Labs as of Sep 15, 2021, for the AMD Instinct™ MI250X (128GB HBM2e OAM module) 500 Watt accelerator at 1,700 MHz peak boost engine clock resulted in 95.7 TFLOPS peak theoretical double precision (FP64 Matrix), 47.9 TFLOPS peak theoretical double precision (FP64 vector) floating-point performance. The AMD Instinct MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.5 TFLOPS peak theoretical double precision (FP64 Matrix), 11.5 TFLOPS peak theoretical double precision (FP64 vector) floating point performance.

2. MI200-27: The AMD Instinct™ MI250X accelerator has 220 compute units (CUs) and 14,080 stream cores. The AMD Instinct™ MI100 accelerator has 120 compute units (CUs) and 7,680 stream cores. MI200-30: Calculations conducted by AMD Performance Labs as of Oct 18th, 2021, for the AMD Instinct™ MI250X and MI250 accelerators (OAM) designed with CDNA™ 2 6nm FinFET process technology at 1,600 MHz peak memory clock resulted in 128GB HBM2e memory capacity and 3.2768 TFLOPS peak theoretical memory bandwidth performance. MI250X/MI250 memory bus interface is 8,192 bits and memory data rate is up to 3.20 Gbps for total memory bandwidth of 3.2768 TB/s. Calculations by AMD Performance Labs as of OCT 18th, 2021 for the AMD Instinct™ MI100 accelerator designed with AMD CDNA 7nm FinFET process technology at 1,200 MHz peak memory clock resulted in 32GB HBM2 memory capacity and 1.2288 TFLOPS peak theoretical memory bandwidth performance. MI100 memory bus interface is 4,096 bits and memory data rate is up to 2.40 Gbps for total memory bandwidth of 1.2288 TB/s.

3. MI200-13: Calculations as of SEP 18th, 2021. AMD Instinct™ MI250 built on AMD CDNA™ 2 technology accelerators support AMD Infinity Fabric™ technology providing up to 100 GB/s peak total aggregate theoretical transport data GPU peer-to-peer (P2P) bandwidth per AMD Infinity Fabric link, and include up to eight links providing up to 800GB/s peak aggregate theoretical GPU (P2P) transport rate bandwidth performance per GPU OAM card for 800 GB/s. AMD Instinct™ MI100 built on AMD CDNA technology accelerators support PCIe® Gen4 providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card, and include three links providing up to 276 GB/s peak theoretical GPU P2P transport rate bandwidth performance per GPU card. Combined with PCIe Gen4 support, this provides an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. Server manufacturers may vary configuration offerings yielding different results. MI200-13"