High Performance Linpack (HPL) is a portable implementation of the Linpack benchmark that is used to measure a system's floating-point computing power. The HPL benchmark solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers measuring the floating-point execution rate of the underlying hardware.
In a complex HPC system, no single computational task can accurately reflect the overall performance of its computational capabilities. In order to standardize around a set of computational operations to measure an HPC systems’ capabilities, the HPL benchmark has become an industry standard and has been widely adopted by leading Supercomputing centers.
HPL is often one of the first programs run on large computer installations to produce a result that can be submitted to TOP500. It is used as a reference benchmark to provide data for the system ranking on the TOP500 list comparing it to supercomputers worldwide.
The June 2022 TOP500 listing showcased the Frontier supercomputer at Oak Ridge National Laboratory, using AMD Instinct™ accelerators and AMD EPYC™ processors, reaching the historic milestone of exceeding 1 ExaFlop of performance on HPL. AMD is now open sourcing the rocHPL code branch used in the Exascale run on Frontier, providing the industry with access to rocHPL code to run on a broad range of AMD Instinct accelerator-powered platforms. In addition to the previously open sourced rocHPCG code, rocHPL continues the commitment from AMD to the open-source philosophy. As well, AMD is the only HPC vendor to have open sourced both HPL and HPCG codes rather than binaries.
rocHPL and rocHPCG are benchmark ports based on the HPL and HPCG benchmark applications, implemented on top of the AMD ROCm™ Platform, runtime, and toolchains. rocHPL and rocHPCG are created using the HIP programming language and optimized for the latest AMD Instinct™ GPUs. HIP (Heterogenous-computing Interface for Portability) is an open programing paradigm that not only runs on AMD GPUs but is portable to run on other vendor GPUs and CPUs as well.
The AMD rocHPL is an innovative implementation that highlights the unique advantages of a 3rd Gen AMD EPYC processor and AMD Instinct MI250 accelerator-powered system. rocHPL performs the panel factorization on the CPUs, while performing most of the compute work on the Instinct accelerators. The unique system design unifying these EPYC processors and Instinct accelerators with the memory cache coherent AMD Infinity Fabric™ interconnect were the technologies used to enable Frontier to attain the performance necessary to achieve Exascale.
The below plot shows over 2.8X performance advantage of rocHPL and over 1.6X performance advantage on rocHPCG using four AMD Instinct MI250 accelerators and ROCm SW stack against the A100 GPUs from another GPU vendor.
Figure 1: HPL Performance on AMD Instinct™ MI250 accelerators.
Figure 2: HPCG Performance on AMD Instinct™ MI250 accelerators.
The open source code is available at below links.
rocHPL: https://github.com/ROCmSoftwarePlatform/rocHPL
rocHPCG: https://github.com/ROCmSoftwarePlatform/rocHPCG
The containers for HPL and HPCG are available at AMD Infinity Hub along with instructions.
Making the ROCm platform even easier to adopt
For ROCm users and developers, AMD is continually looking for ways to make ROCm easier to use, easier to deploy on systems and to provide learning tools and technical documents to support those efforts.
Helpful Resources:
Bryce Mackin is in the AMD Instinct™ GPU product Marketing Group for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
Endnotes: