From high-performance computing, deep-learning, and rendering systems, to cloud computing, training complex neural networks, and AMD’s ROCm open ecosystem these blogs offer more insights and updates into our products and solutions.
In the dynamic landscape of technology, particularly in the realm of advanced computing and GPU-accelerated computing, staying updated with the latest developments is paramount. AMD has been consistently pushing boundaries with its AMD ROCm™ Software stack, catering to the diverse needs of developers and researchers in harnessing the power of GPUs for computation-intensive AI and HPC applications.
We are at a stage in our product ramp where we are consistently identifying new paths to unlock performance with our ROCM software and AMD Instinct MI300 accelerators. We have made a lot of progress since we recorded data in November that we used at our launch event and are delighted to share our latest results highlighting these gains.
These gains show that AMD Instinct MI300X with ROCm 6 continues to show leadership inference performance using the popular FP16 datatype and vLLM inference library compared to Nvidia H100 using TensorRT-LLM and FP16 or FP8 datatypes.
The newest family of AMD accelerators, the AMD Instinct™ MI300 Series, featuring the third-generation Compute DNA (AMD CDNA™ 3) architecture, offer two distinct variants designed to address AI and HPC markets.
Most Machine Learning (ML) engineers use single precision (FP32) datatype for developing ML models. TensorFloat32 (TF32) has recently become popular as a drop-in replacement for these FP32 based models. However, there is a pressing need to provide additional performance gains for these models by using faster datatypes (such as BFloat16 (BF16)) without requiring additional code changes.