Siemens Sourcery CodeBench Lite empowers HPC applications with AMD Instinct™ GPU enhancements

guy_ludden · ‎12-08-2022

Supercomputing is unlocking huge possibilities for the furtherance of science. As the Exascale age arrives with unprecedented new systems such as Oak Ridge National Laboratories’ (ORNL) Frontier, the possibilities for computational discovery have exploded. To take advantage of the dramatically increased processing power, scientists need to optimize their applications. This is where a capable toolchain can help port over their software, and one of the most effective is Sourcery CodeBench Lite Edition from Siemens Digital Industries Software. This open-source compiler provides exactly what scientists need to harness the potential of the AMD Instinct™ GPUs employed by many of the latest supercomputers.

Compiling for GPU optimization

“We started off by building a FORTRAN compiler for AMD and the GNU Compiler Collection (GCC),” says Catherine Moore, Software Engineering Director, Siemens. “Over the years we've continued our relationship.” Andrew Stubbs, Project Engineering Team Lead, Siemens, adds: “We started working towards the Frontier supercomputer in 2017. The potential of the OpenMP multiprocessing platform and FORTRAN toolchains is about making existing software work better on the supercomputers. Historically, lots of workloads that scientists want to run are written in FORTRAN and one of the best FORTRAN compilers out there is GCC.”

Each new GPU generation adds extra functionality to exploit, such as the AMD Instinct MI250X accelerators deployed in Frontier. “The first step is to get the instruction set architecture support into the compiler,” says Stubbs. “The compiler is quite good at producing efficient code for CPUs, but the GPUs are different…What we've been doing this last year is trying to improve the compiler’s understanding of optimization opportunities for the GPU. We've been trying to get it so that it enables vectorization in more cases, to get the maximum potential out of the GPU. But it doesn't change the way that an application works. It is meant to be completely invisible apart from the improved performance.”

Adding the latest GPU features

The compiler plays a key role in minimizing the work necessary to obtain the most performance. “The toolchain is supposed to just work,” says Stubbs. “The application developers add OpenMP or OpenACC directives into their code, so we're also adding support for the latest and greatest OpenMP capabilities, to get the most performance. CodeBench Lite is our brand name for freely downloadable open-source-based command line toolchains, which are available for different environments. We have been working on AMD CodeBench Lite that should be able to compile any standards-compliant FORTRAN program and most OpenMP and OpenACC code to run on AMD GPUs.”

“Our latest release (November 2022) addresses the optimization requirements that exist in the current version of GCC for targeting AMD GPUs,” says Moore. “We currently support AMD GPUs including [AMD architectures] GCN 3, Vega 10, Vega 20, the [Radeon Instinct™] MI50, the [AMD Instinct] MI100, and the MI200,” adds Stubbs. “The most interesting thing for us with the AMD Instinct MI200 is that it adds support for unified shared memory. Once enabled in your code, you don't have to explicitly move data between the CPU and GPU.” This feature can unlock considerable performance benefits. “If you are dealing with sparse data sets where you don't know precisely which bytes you're going to use or there's just too much of it to fit in the device memory, then that starts to bottleneck,” says Stubbs. “You end up copying more data than you need or can fit on the device. The unified shared memory system means that the pages are only copied to the device on demand…so overall performance can be much better.”

Maximum performance, minimum effort

CodeBench Lite makes taking advantage of features like this seamless. “Open-source compilers are not packaged,” says Moore. “If an engineer wants to use gfortran, for example, that engineer would have to go to the GCC website, figure out how to build it, figure out what other open-source components are necessary to make it a working toolchain, and so on. With CodeBench Lite we implement features and performance enhancements, package them and test all those things together so that you can download a binary, install it and you're ready to go.”

“There are best practices for OpenMP to get maximum performance out of the device.” says Stubbs. Part of this project is to try to make it so that the compiler automatically applies best practices because users don't know how to,” says Stubbs. “The CodeBench Lite toolchain that you can download right now is ahead of upstream GCC, in terms of OpenMP and OpenACC features. It has more OpenMP 5.0 and 5.1 than you would get from any other source. It is part of our job is to improve the GCC ecosystem, but support for new features lands first in our toolchain. We are currently testing on the AMD Instinct MI200 devices. We're making sure that when you download CodeBench Lite you get something that we've certified and gives you the best possible performance on the latest supercomputing platforms.”

To get the latest version of CodeBench Lite for AMD visit : SIEMENS.com/Sourcery-AMD

Helpful Resources:

Learn more about our latest AMD Instinct accelerators, including the new Instinct MI210 PCIe® form factor GPU recently added to the family of AMD Instinct MI200 series of accelerators and supporting partner server solutions in our AMD Instinct Server Solutions Catalog.
The ROCm web pages provide an overview of the platform and what it includes, along with markets and workloads it supports.
ROCm Information Portal is a new one-stop portal for users and developers that posts the latest versions of ROCm along with API and support documentation. This portal also now hosts the ROCm Learning Center to help introduce the ROCm platform to new users, as well as to provide existing users with curated videos, webinars, labs, and tutorials to help in developing and deploying systems on the platform. It replaces the former documentation and learning sites.
AMD Infinity Hub gives you access to HPC applications and ML frameworks packaged as containers and ready to run. You can also access the ROCm Application Catalog, which includes an up-to-date listing of ROCm enabled applications.
AMD Accelerator Cloud offers remote access to test code and applications in the cloud, on the latest AMD Instinct™ accelerators and ROCm software.

Bryce Mackin is in the AMD Instinct™ GPU product Marketing Group for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.