NAMD 3.0 Delivers Scalable Molecular Dynamics for GPUs

guy_ludden · ‎06-21-2022

Molecular dynamics is a supremely useful method within computer simulation, capable of eliciting valuable scientific predictions in a wide range of fields from chemical physics to materials science to biophysics. One of the most highly regarded applications for calculating molecular dynamics is NAMD (Nanoscale Molecular Dynamics). It was one of eight workloads chosen by the Center for Accelerated Application Readiness to be prepared to run on Oak Ridge National Laboratory’s (ORNL) Frontier Supercomputer, the first exascale supercomputer in the USA, which is currently the fastest supercomputer in the world.*

life sci.png NAMD is capable of simulating huge systems of up to two billion atoms. NAMD has been utilized to provide invaluable insights into the COVID-19 virus, promising significant breakthroughs into how viruses behave. Molecular dynamics has so much to offer that NAMD has recently been upgraded to provide scalability that doesn’t require a supercomputer budget and the ability to match GPU and CPU capabilities on the host platform. The key enhancement has been enabling the code to run entirely on GPU, so that the rapid development of processing power from accelerators such as the AMD Instinct™ MI200 series can be harnessed more easily.

“We now have a version of NAMD doing dynamics almost entirely on the GPU so that it's extremely fast, then we can get that much more sampling done if we're on a multi-GPU node.” says Dr. David J. Hardy, Senior Research Programmer in the NIH Center for Macromolecular Modeling and Bioinformatics, the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign, and lead developer of NAMD.

Molecular dynamics is all about modelling the interaction between atoms over time. Without multi-GPU scaling, the sweet spot to take full advantage of a GPU is between 100,000 and 200,000 atoms. But with multi-GPU scaling, this increases considerably. “When you're scaling across multiple GPUs, our test case has been a one million atom system,” says Hardy. With the original NAMD approach, implemented on Frontier for example, a GPU offload is used where the accelerator is just calculating forces and doesn’t preserve states on the GPU between time steps. Nonetheless, the CPU can then become a bottleneck, particularly as GPU performance races upwards.

To counteract this, NAMD 3.0 was created. “We did a lot of rewriting to make the code GPU resident,” says Hardy. “This GPU resident version of the code is maintaining data between time steps on the GPU, so you no longer pay the penalty of host device memory transfers that we had to do with the GPU offload model. It’s a huge payoff for us, a doubling of performance when running on a single GPU.” NAMD 3.0 can also scale across multiple GPUs on a single-node, although the code retains the multi-node, GPU-offload mode of execution, with the mode chosen via a simple configuration setting.

The increase in performance from GPU-resident code means that the cost of each time step calculation is lower. “We're able to run more time steps per day,” says Hardy. This means that NAMD 3.0 can deliver more nanoseconds of simulation per day, which means research can be performed more quickly and results obtained faster. “Sampling rates are key for the end users of NAMD because they correlate with the biological phenomena that they're usually studying. Doubling the sampling rates with GPU-resident code allows end users to simulate phenomena in a more reasonable timeframe.” For vaccine research, for example, this means creating new drug candidates for live testing more quickly and finding a cure in less time.

The capabilities of NAMD have also been greatly assisted by AMD HIP, which allows developers familiar with NVIDIA CUDA to transition their code across to AMD Instinct GPUs with ease. “With a little bit of extra header file and macro magic, we're able to use HIPIFY to achieve really good AMD GPU acceleration,” says Hardy. “That's been tremendously helpful for us.” This has also been essential because ORNL’s Frontier will be getting most of its FLOPS from its AMD Instinct GPUs, meaning NAMD had to be AMD-native to take advantage of this exascale supercomputer’s immense processing power.

“Now that the GPU is this really wonderful commodity device, that's what we want to be able to support NAMD on,” says Hardy. “We're very excited that we're as far along with being able to run NAMD fast on AMD GPUs. We're very thankful for HIP and HIPify technology and how easy our transition has been to be able to run on AMD. We haven't had to introduce an entirely new code path, which would've been a huge barrier to being able to get good performance.”

Looking forward, the roadmap for NAMD is to scale the GPU-resident code across multiple nodes, as well as, multiple GPUs within a single node. Right now, multi-GPU scaling requires a special interconnect to allow direct memory access between GPUs on the same node. “We're going to have to adopt some different communication technology to be able to do this,” says Hardy, referring to RDMA (Remote Direct Memory Access) to communicate directly between GPUs on different nodes.

The GPU-resident version of NAMD will also require expanded capabilities to bring it in line with the features of the offload version. “Molecular dynamics is a very flexible methodology, and because of that, there are a lot of different advanced features that have been introduced into NAMD to exploit various mechanisms in molecular dynamics to do things from a methodological point of view to accelerate your calculations. When you’re doing molecular dynamics, part of what you're trying to see are changes as you're crossing different energy barriers", According to Hardy.

“A lot of these enhanced sampling methods allow you to better cross these energy barriers,” continues Hardy. “This could be by pulling on part of the structure or by introducing directed forces into the structure or by modifying the interactions themselves to flatten the energy surface a bit better so that you can do a faster sampling across this energy surface and visit the different confirmational states of interest for whatever it is that you're simulating. We only have a handful of these now supported on the single GPU-resident version. We don't have any of them yet that are quite working on the multi-GPU version. A big thing for us now is to also be able to support these different advanced features on the GPU-resident code.”

NAMD 3.0 is already providing unparalleled levels of simulation speed via its GPU-resident update. Once the GPU-resident version of NAMD is brought in line with the capabilities of the offload version, it will unleash even more flexibility for molecular dynamics research. It will put high-performance tools in the hands of a wider range of researchers to make new scientific discoveries even more quickly.

Lear more about NAMD and NAMD 3.0 on AMD Infinity Hub.

Bryce Mackin is Sr. Product Marketing Mgr. for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

All performance and cost savings claims are provided by Dr. David J. Hardy and have not been independently verified by AMD. Performance and cost benefits are impacted by a variety of variables. Results herein are specific to the NIH Center for Macromolecular Modeling and Bioinformatics and may not be typical. GD-181

* See Top500 June, 2022 list.