2024 was a transformative year for AMD Instinct™ accelerators. In the last 12 months we successfully delivered dozens of Instinct MI300X platforms to market across our cloud and OEM partners. We ramped numerous customers into volume production, including Microsoft which achieved “market leading price-performance for GPT4 inferencing” and Meta which is using MI300X as the “exclusive inferencing solution for their frontier Llama 405B model.” Importantly, production deployments and pre-production engagements continue growing across both established customers and emerging AI startups for both inferencing and training applications.
As much progress as we have made with our hardware and platforms, accelerating our software roadmap remains our top priority. Our vision is for AMD ROCm™ to be the industry’s premier open AI stack, enabling choice and rapid innovation. We have made excellent progress at all layers of the stack this year. More than 1 million models on HuggingFace now work out of the box on AMD and our platforms are well supported in leading frameworks like PyTorch and JAX, emerging compilers like OpenAI Triton, and serving solutions like vLLM and SGLang. We also continue strengthening the stack with support for key libraries, datatypes and algorithms like FlashAttentionv3 and we demonstrated excellent inference performance with our very first MLPerf submission earlier this year. In order to accelerate the cadence of software updates for our growing customer base, we have introduced biweekly optimized container releases in addition to regular ROCm releases that include new features and capabilities.
Listening to users about real-world product capabilities has been fundamental to our success. We also believe that open collaboration and feedback are crucial for driving innovation and building a robust ecosystem.
While we have made good progress in many areas, the breadth of AI workloads is both large and continuously evolving. We know there is more work to do to provide comprehensive support for the broad ecosystem. A recent article by SemiAnalysis highlighted some gaps in our training ecosystem support and provided constructive feedback on improving usability. We believe an open-source ecosystem for AI is in the industry’s best interest, and we always encourage community feedback as we incorporate improvements into subsequent ROCm releases. As such, we have an ambitious software roadmap for 2025 that incorporates many enhancements that will enable easier adoption and improved out of box support for both inferencing and training applications.
Key priorities to support the broader ecosystem include:
- Expanded support for broad-based training. This means support and optimization for the latest algorithms, including Expert Parallel (EP), Context Parallel (CP), and Flash Attention 3. As well, we will support the latest datatypes and collectives across ML frameworks, including PyTorch, JAX, and popular training libraries such as DeepSpeed and MaxText, starting in Q1.
- Expanded inference support spanning LLMs, non-LLMs, and multi-modal models. This includes enhanced optimizations for popular frameworks and emerging serving solutions (e.g., vLLM, SGLang), improvements to underlying libraries (GEMMs, selection heuristics), introduction of next-generation AI operators (e.g., advanced Attention, fused MoE), and further fine-tuning of new data types.
- Richer out of the box support across operators, collectives, and common libraries to make it easier and faster to deploy our solutions. This includes packaged tooling, more deployments options, and ongoing documentation extensions.
- Frequent and easy-to-consume performance updates, while maintaining high quality stable ROCm releases. We started to offer these biweekly updates for inferencing earlier this year and are actively expanding to also cover training updates. The first training docker was released on December 16th and the next drop is planned for December 30th.
We invite the community to join us on this journey to make ROCm even better. Together, we can build a robust open-source ecosystem for AI & high-performance computing.
Please stay tuned for further updates on ROCm ecosystem, developer enablement and performance progress at our ROCm community, AMD Infinity Hub or on Discord.