Pushing the boundaries of simulation with PIConGPU on AMD Instinct™ MI250X GPUs

guy_ludden · ‎05-31-2023

The quest to understand the mysteries of the universe has led scientists to explore the inner workings of plasma and laser interactions. At the forefront of this research is a team of domain experts led by Sunita Chandrasekaran, Associate Professor of Computer and Information Sciences at the University of Delaware and Dr. Alexander Debus with a group of computational laser-plasma scientists at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) in Germany. Since we last wrote about their project in 2021, the PIConGPU team has made significant advancements in their pioneering work of simulating these interactions using a code known as PIConGPU, a plasma physics simulation system that is highly optimized for GPU acceleration and capable of running at scale on some of the world's most powerful supercomputers. With the help of the AMD Instinct™ MI250X GPUs, the team has achieved groundbreaking results that could revolutionize x-ray light sources for material science, radiation therapy for cancer treatment, and particle accelerators as we know it today.

Handling complex and massive simulations

One of the standout features of the MI250X accelerator is its large 128GB HBM2e memory capacity, which allows the processor to handle more extensive and complex simulations. Chandrasekaran explains that “Summit's NVIDIA Volta GPU has 16GB of HBM memory, with 32GB V100s on some high-memory nodes. However, Frontier's AMD MI250X provides 128GB of total memory. That helps us solve higher-fidelity problems!”

Energy efficiency is a critical factor for modern HPC systems. It is vital that the code be able to simultaneously utilize substantial amounts of compute resources, while not drawing excessive megawatts of power. To that end, Chandrasekaran notes that, “When PIConGPU uses 98% of Frontier’s compute capacity, or about 37,000 AMD Instinct MI250X GPUs, it sustains a power draw of 21 MW compared to Frontier’s peak power draw of 29MW. Our scientific case study does not reach the peak power since PIConGPU, like most real-world applications, doesn’t exclusively perform computations for the entire run but also engages in less energy-consuming activities like data movement and communication.”

Additionally, the MI250X accelerator can handle the large amounts of data generated by PIConGPU simulations. Chandrasekaran explains, “reading, writing, and analyzing petabytes of data is a huge challenge. Our initial data analysis must be conducted asynchronously during the simulation before transferring or storing any data. Here is where Frontier is very useful. This is one of the first systems where storing the current state of a large-scale plasma simulation as large as one petabyte is even possible!”

PIConGPU uses the Alpaka backend and the Plasma-in-Cell (PIC) algorithm for its science case simulations. Alpaka, an open-source abstraction library written in C++17, provides performance portability across accelerators by abstracting underlying levels of parallelism. Operating on top of the AMD HIP and AMD ROCm™ platforms, Alpaka is where most of the porting is done, rather than within PIConGPU. As the AMD ROCm platform is an open-source software ecosystem, it helps enable PIConGPU developers to understand the software’s underpinnings, file bugs, and interact with AMD developers.

PIConGPU simulations yield scientific insights and breakthroughs

The PIConGPU simulations have already yielded important scientific insights into the behavior of plasma and laser interactions. Specifically, the simulations have shown promise in testing new laser-plasma accelerator designs, achieving giga-electron volts, and even coming close to tera-electron volts. These breakthroughs have significant implications for developing new particle accelerators that will be crucial for advancing scientific understanding. “PIConGPU simulations on Frontier will allow us to build realistic models of novel, more scalable laser-plasma accelerators that potentially reach unprecedented particle energies,” says Alexander Debus, who, together with his team, received a 2023 INCITE award at Oak Ridge Leadership Computing Facility (OLCF) for simulation time on Frontier.

The AMD Instinct MI250X GPU's ability to handle both compute-intensive and memory-bound applications like PIConGPU has helped enable the team to run more extensive and complex simulations than ever before, yielding important insights into the behavior of plasma and laser interactions. The team can now simulate the proton acceleration process with extreme accuracy, enabling the design of incredibly effective proton therapy treatments for cancer patients. This is a significant development, as current radiation therapy treatments can cause damage to healthy tissues surrounding the cancerous area. The ability of ultrashort proton beams to target tumors more precisely can improve outcomes for patients and potentially reduce treatment side effects. “The so-called flash effect from plasma-based proton accelerators has a tremendous potential for proton-radiotherapy. Large-scale laser-plasma simulations are essential to understand its rich plasma dynamics to design and refine current flagship experiments,” explains Michael Bussmann, founding manager of Center for Advanced Systems Understanding (CASUS) and group leader for Computational Radiation Physics at HZDR.

Figure 1:PIConGPU simulation of a cryogenic hydrogen jet interaction with a high-intensity laser beam. (Courtesy of Richard Pausch and Rene Widera from HZDR)

The images to the right depict PIConGPU simulations of a cryogenic hydrogen jet interacting with a high-intensity laser beam, a model of an experiment recently performed at HZDR at the DRACO laser system. In its process the laser disrupts the hydrogen jet. An expanding cloud of plasma electrons creates strong electric currents and thus electromagnetic fields that give rise to proton acceleration. These accelerated protons could potentially be used for cancer treatment in the future. Using Alpaka and the in-situ library ISAAC plug-in, and running on 1024 AMD Instinct MI250X GPUs, makes live rendering and steering of PIConGPU simulations possible. These capabilities allow HZDR developers to steer the simulation remotely, even from across the Atlantic Ocean.

Realizing transformative performance

An important metric for the PIConGPU simulations is tera updates per second (TUPS), which measures the speed-up of the code. On Summit, the team achieved 14.7 TUPS, while on Frontier, they achieved 65.3 TUPS, a 4.4x increase (single precision). This increase in performance was achieved using 9,400 Frontier nodes, each with four MI250X GPUs, for a total of 37,600 MI250X GPUs. The boost in performance was even more pronounced when using double precision, the team gained an increase of 8.5x. René Widera, PIConGPU's lead programmer, says, “At double precision, PIConGPU simulations lead to runtimes that are only marginally longer compared to those at single precision. For PIConGPU on the AMD Instinct MI250X GPU, the extra accuracy of double precision basically comes for free.”

The team worked very closely with the AMD hardware, software, and tools experts to realize Frontier’s potential and work on critical science. This is a truly collaborative and interdisciplinary project among domain and computer scientists, AMD, HPE/Cray, and ORNL. Furthermore, Chandrasekaran has also used this opportunity to train the next-generation workforce, including PhD and undergraduate students who have been working on this project.

“The MI250X accelerator has been a game-changer for our simulations,” says Chandrasekaran. “It enables us to run simulations that were simply not possible with hardware that the code has run on before.” The research led by Chandrasekaran and her team demonstrates the enormous potential of high-performance computing and GPU acceleration in enabling simulations of unprecedented scale and resolution. The AMD Instinct MI250X GPU has played a crucial role in these breakthroughs, providing the team with the memory capacity, efficiency, and performance required to tackle some of modern science's most complex and essential questions.

Making the ROCm platform even easier to adopt

For ROCm users and developers, AMD is continually looking for ways to make ROCm easier to use, easier to deploy on systems and to provide learning tools and technical documents to support those efforts.

Helpful Resources:

The ROCm web pages provide an overview of the platform and what it includes, along with HPC & AI markets and workloads it supports.
ROCm Information Portal is a portal for users and developers that posts the latest ROCm versions along with API and support documentation. This portal also hosts ROCm learning materials to help introduce the ROCm platform to new users, as well as to provide existing users with curated videos, webinars, labs, and tutorials to help in developing and deploying systems on the platform.
AMD Infinity Hub gives you details on ROCm supported HPC applications and ML frameworks, and how to get the latest versions and install documents. You can also access the ROCm Application Catalog there, which includes an up-to-date listing of ROCm enabled applications.
Finally, learn more about our AMD Instinct™ MI200 Series of accelerators and partner server solutions in our AMD Instinct Server Solutions Catalog.

Bryce Mackin is Sr. Product Marketing Mgr. for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Endnotes:

The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18

All performance and cost savings claims found herein have been provided by PIConGPU have not been independently verified by AMD. Performance and cost benefits are impacted by a variety of variables. Results herein are specific to PIConGPU and may not be typical. GD-181

© 2023 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Instinct, ROCm, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners.