At AMD, we believe that AI is the most important thing to impact computing and technology in the past 50 years, and we remain dedicated to driving innovation and collaboration within the AI industry.
Through our open software ecosystem, ROCm™, and our cutting-edge AI accelerators, we are shaping the future of AI. In this update, we'll talk about our commitment to open-source innovation, what we have done to optimize ROCm and the AMD Instinct™ MI300X, and the continued expansion of our open ecosystem for AI.
The AMD AI Software Ecosystem Continues to Mature
The AMD ROCm open software ecosystem continues to mature. Our software developers have spent the past few months looking at optimization points for the current and future versions of ROCm, making sure it enables easy, out-of-the-box setup and delivers incredible performance on the AMD Instinct MI300X solutions.
Some of these key optimizations include:
- FlashAttention-2 support has been publicly released for MI300X
- vLLM support for MI300X is now available starting 0.3.0 version
- Quantization support was added using Bits & Bytes and AWQ / GPTQ
- OpenAI has merged Triton AMD GPU support in upstream and is now available for users, with full release planned with OpenAI Triton 3.0.
- HIP Graph will be part of ROCm 6.1 release in Q1 2024
These optimizations are up-streamed to our open-source collaborators and available for public use and other optimizations are planned to be available with ROCm 6.1 release set for Q1. These are the optimizations that helped the AMD Instinct MI300X accelerator deliver incredible performances including the 2.1x Llama-70B inference latency performance (median) and 1.6x BLOOM 176B throughput (1), as we showed in December.
ElioVP cloud talked about their use of AMD Instinct MI300X, with these optimizations, on the Bloom 176B model with the ZeRO-inference technique, seeing great results. As well, Supermicro posted about their real-world performance experience with AMD Instinct MI300X.
New Cloud AI Availability for AI developers
As developers and technologists continue to advance AI, access to high performance AI accelerators is critical to fostering open and collaborative innovation. AMD is working with numerous cloud service providers and OEMs/ODMs including Microsoft Azure, Oracle Compute Infrastructure, Dell Technologies, Supermicro, Lenovo and others to bring AMD Instinct MI300X powered systems to the market. These systems and instances will be available in the coming months.
More immediately, Lamini is already using AMD Instinct MI300X and MI210 to help companies build out enterprise LLM offerings. They are up and running at scale and seeing tremendous benefits.
“We've only bought AMD GPUs so far, and earlier this year purchased AMD Instinct MI300X for our LLM platform. With just turning it on, we immediately saw an out-of-the-box 5X performance bump compared to the MI250x in our previous cluster—zero modifications. The 192GB of HBM memory capacity is enormous, which amounts to a whopping 1.5TB for a node of 8 GPUs, allowing our customers to run and scale huge LLMs. Beyond performance, we're seeing amazing demand for our LLM solutions with MI300X, ranging from Fortune 500 enterprises to leading tech unicorns, who all have the same goal—successfully getting their proprietary data into the most valuable data derivative today: LLMs. All possible on AMD." - Sharon Zhou, CEO, Co-Founder, Lamini.
AMD is also working other cloud service providers like TensorWave, to open new avenues of AI accelerator availability. TensorWave has AMD Instinct MI300X’s available now and is seeing increased demand for the product.
Advancing Open-Source AI
AMD is continuing to drive open-source innovation for AI ecosystems because we feel that is the best path forward for the industry and to enable the best from AI developers.
AMD believes that open source is the best way to accelerate innovation in AI. By making our software and hardware open source, we are enabling developers to build and deploy AI applications more quickly and easily. This is why we’ve also joined key partnerships, like the AI Alliance which consists of companies, startups, universities, research and government organizations, and non-profit foundations that are working to innovate across all aspects of AI technology, applications and governance—in an open and transparent way. We are also creating a vibrant community of developers who are working together to solve the challenges of AI, with the launch of the AMD ROCm Developer Hub. A place to access all ROCm developer resources—from documentation, to training webinars, to the latest blogs on optimization, and more.
We are excited to see the growth of the open ecosystem for AI. We believe that this is the best way to accelerate innovation and make AI more accessible to everyone. We are committed to working with our technology partners to continue to grow the ecosystem and make it easier for developers to build and deploy AI applications.
------------------------------------------------------------------
Endnotes
1) Token generation throughput using DeepSpeed Inference with the Bloom-176b model with an input sequence length of 1948 tokens, and output sequence length of 100 tokens, and a batch size tuned to yield the highest throughput on each system comparison based on AMD internal testing using custom docker container for each system as of 11/17/2023.
Configurations:
2P Intel Xeon Platinum 8480C CPU powered server with 8x AMD Instinct™ MI300X 192GB 750W GPUs, pre-release build of ROCm™ 6.0, Ubuntu 22.04.2.
Vs.
An Nvidia DGX H100 with 2x Intel Xeon Platinum 8480CL Processors, 8x Nvidia H100 80GB 700W GPUs, CUDA 12.0, Ubuntu 22.04.3.
8 GPUs on each system were used in this test.
Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI300-34