Today in San Francisco, California, AMD held a special event where we announced the newest additions to the Radeon Instinct™ family of compute products; the AMD Radeon Instinct™ MI60 and Radeon Instinct™ MI50. In step with the new hardware, the Radeon Open eCosystem (ROCm) has been updated with massive improvements in the device drivers, the compilers and supporting tools. The low-level math libraries, along with MIOpen, the machine intelligence library, have been optimized to really make deep learning applications sing.
ROCm is an open software platform for GPU-enabled HPC computing. It was created with developers in mind to accommodate future technologies including machine learning and artificial intelligence. As an open platform, the ROCm ecosystem provides a rich foundation of modern programming languages, designed to speed development of high-performance, energy-efficient heterogeneous computing systems.
We enabled AMD’s ROCm capable GPUs in the Linux ecosystem for easy deployment of deep learning applications in Linux distributions. The amdkfd device driver is now supported in the mainline kernel and this kernel is picked up by all the major distributions for their standard releases. Now we also support MI60 and MI50, based on the new Vega architecture, in the linux-next repository. For distributions not using the latest kernel, a DKMS build is still a viable option to add support for the MI60 and MI50 GPUs.
We have updated the LLVM based clang compiler to support the new GPU architecture, including the new compute instructions targeted to accelerate machine learning computations. These low-level instructions implement compute operations all the way from single bit precision to 64-bit floating point. The most beneficial instruction for the acceleration of deep learning training is a float 16 dot product which accumulates into a 32-bit result, maintaining the accuracy of the operation.
Profiling and debugging tools required updates to support the new hardware. These tools enable developers to get the most out of the GPU compute cycles and understand where the bottlenecks occur in their applications. Follow the development on our github site.
Math libraries were customized with the hardware architecture in mind, resulting in an very optimized solution. There are many different ways to optimize these math operations, and each specific matrix and convolution size needs to be tuned, so AMD built a tool to help automate the optimization process. This tool is called Tensile and is very useful for creating a library for GEMMs, GEMM-like problems (such as batched GEMM), N-dimensional tensor contractions, and anything else that multiplies two multi-dimensional objects together on a GPU. MIOpen also underwent massive optimizations and updates to realize the incredible benefits of the foundational math libraries when integrated with deep learning frameworks.
One of the most exciting developments over the past year is the integration and progress with the machine learning frameworks. ROCm has been updated to support the TensorFlow framework API v1.11 and is actively upstreaming the code into the main repository. Check out the TensorFlow github to follow the updates or see our github page for PyTorch, Caffe2, Caffe and other framework developments.
To try out the newest packages, develop an application and easily deploy a ROCm solution, get the most recent Docker images here - which saves you the time of collecting all the libraries and building them specifically for your platform.
We are always looking for skilled developers excited to work in this rapidly changing field. Check out our job listings at amd.com.