AMD Lab Notes Blog Series: AMD Instinct™ MI200 Series GPU Memory Space and AMD Profiling 101

guy_ludden · ‎04-12-2023

In this next series of AMD lab notes, two blog posts have been released to help readers and developers further understand and optimize the performance of heterogenous accelerated systems. Specifically, the first post focuses on how to effectively utilize the memory for architectures like the AMD Instinct™ MI200 Series family of accelerators, whereas the second post covers techniques for profiling GPU applications using relevant libraries and tools.

As always, please visit https://github.com/amd/amd-lab-notes/ for all the latest documentation with accompanying code examples. The highlights for the above two blogs can be found below:

MI200 GPU Memory Space Overview

To get the best performance on modern accelerated HPC systems, special care must be taken to ensure that data is in the right place at the right time. The HIP API supports a wide variety of options for controlling data across CPUs and GPUs to assist heterogeneous, accelerated processing. The memory architecture of graphics cards like the AMD Instinct™ MI200 series accelerators can be complex and understanding how to effectively use it is critical to optimizing performance. This high-level overview post will:

Introduce a set of commonly used memory spaces.
Identify what makes each memory space unique.
Discuss some common use cases for each space.

Specific topics cover host vs device memory, pageable vs pinned host memory, and coarse-grained vs fine-grained coherence. Code snippets of key APIs and functions are shown throughout the post. The blog post focuses primarily on AMD Instinct™ MI200 family of GPUs. However, many of the concepts discussed will carry over to other accelerators and APIs.

All technical content and accompanying code examples can be found here at AMD Lab Notes

AMD Profiling 101

Performance-tuning is the first fundamental step in optimizing a GPU application. To better understand how hardware is being utilized, it is crucial for developers to have visibility into hardware activity and how compute kernels are utilizing resources. With AMD's available profiling tools, developers can gain important insight into the "health" of their application and effectively diagnose potential bottlenecks contributing to poor-performance. Developers targeting AMD GPUs have multiple tools available depending on their specific profiling use case. This blog post serves as an introduction to the various profiling tools available and why a developer might leverage one over the other. This post covers everything from low level profiling tools to exhaustive profiling suites. The post also provides clarity on other AMD tools bearing the "profiler" classification but target AMD products outside the AMD Instinct™ GPU product line used for High Performance Computing (HPC). Following this post will be a series of blog posts diving into the specifics of each tool as well as provide some examples.

In this introductory blog post, we introduce the following open-source tools to aid in application analysis:

AMD ROC profiler
Omniperf
Omnitrace
AMD Radeon™ GPU Profiler
uProf
3rd Party Tools

Read more on an introduction to AMD hardware profiling tools here.

Making the ROCm platform even easier to adopt

For ROCm users and developers, AMD is continually looking for ways to make ROCm easier to use, easier to deploy on systems and to provide learning tools and technical documents to support those efforts.

Helpful Resources:

The ROCm web pages provide an overview of the platform and what it includes, along with markets and workloads it supports.
ROCm Information Portal is a portal for users and developers that posts the latest versions of ROCm along with API and support documentation. This portal also now hosts the ROCm Learning Center to help introduce the ROCm platform to new users, as well as to provide existing users with curated videos, webinars, labs, and tutorials to help in developing and deploying systems on the platform.
AMD Infinity Hub gives you details on ROCm supported HPC applications and ML frameworks, and how to get the latest versions. You can also access the ROCm Application Catalog, which includes an up-to-date listing of ROCm enabled applications.
Finally, learn more about our AMD Instinct MI200 Series of accelerators and partner server solutions in our AMD Instinct Server Solutions Catalog.

Justin Chang is a Software Design Engineer for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.