Accelerated computing has taken the industry by storm—bringing dramatic changes to how software applications including AI and HPC are developed and tuned for maximum impact. GPGPU solutions, such as those from AMD, have been pivotal in enabling advances in both AI and HPC. In the AI space, their massive parallel processing capabilities have been critical in powering the AI revolution, by supporting key frameworks like PyTorch, TensorFlow, and JAX. At the same time, their raw computing power delivers the performance necessary to optimize HPC applications like GROMACS for molecular dynamics simulation, OpenFOAM for computational fluid dynamics, NAMD for molecular modeling, and Ansys Mechanical for computer-aided engineering. By providing a robust, unified software stack catering to both AI and traditional HPC workloads, these accelerated computing platforms are driving breakthroughs across scientific and industrial computing. The versatility of GPGPU architectures makes them the preferred solution for designing, developing, and deploying AI and HPC at scale. This unmatched capability is underpinned by robust software ecosystems, which include libraries and tools widely used in AI and machine learning. This comprehensive support makes the development and deployment process easier and more efficient.
Between the two leading GPGPU solution providers, the commonality in GPGPU, memory, software architecture and AI ecosystem support make it extremely convenient and portable for users and developers to switch when compared to custom ASIC AI accelerators that are targeted for specific use cases. But the small and nuanced differences can at times create performance differences that are not easily understood or overcome without documentation and support. As an example, the thousands of AI models hosted on Hugging Face will run on any GPGPU architecture but there could be significant performance difference depending on which GPU was used to develop and optimize the models.
These differences are due to the variation in the number of compute engines, warp sizes, memory size and bandwidth etc. that needs tuning to extract maximum performance. At a deeper level, some operations, such as GEMMs, could be implemented using more than one library or more than one technique. How does one know which implementation is the fastest and should be chosen for AMD Instinct GPU accelerators?
This new series of blogs will guide readers through the differences between GPGPU architectures, their ecosystems, as well as demonstrate what is required in order to achieve the successful deployment of optimized models on AMD ROCm software. To further enhance community engagement and knowledge sharing, AMD has launched a new destination dedicated to blogs sharing best practices and optimization for demanding AI and HPC solutions. Through this blog site, we hope to:
- Increase community engagement: A technical blog and best practices guide can serve as a hub for delivering the latest news about th.... This enhances community engagement and knowledge sharing.
- Improve ease of use and adoption: A technical how-to guide can make the development and deployment process easier and more efficient. It provides developers with the necessary resources to understand and effectively use ROCm.
- Create visibility: A blog platform showcases the latest advancements and use cases of ROCm, keeping the community updat....
- Encourage collaboration: Engage members of the ROCm community to participate and collaborate directly on the latest developme...
Unveiling the AMD Blog Platform
The newly launched AMD ROCm blogs page is designed to facilitate easy navigation and exploration of new content in the world of ROCm Software providing a glimpse of featured blogs, highlighting the most recent and compelling story for the ROCm software and AMD's accelerated computing advancements. We currently showcase the following topics:
- Implementations of mathematical algorithms, such as partial differential equation discretization, linear algebra, and solvers
- Optimizations for artificial intelligence and high-performance computing applications and machine learning models
- Tips and tricks to leverage ROCm tools, the ROCm software stack, and hardware-level optimizations
- Ecosystem and partner solutions
Whether it's exploring performance optimizations, understanding programming models, or delving into specific use cases, the ROCm blog platform caters to audiences across various proficiency levels.
Developers can expect to learn:
- ROCm setup and dockers to run the latest models like Stable Diffusion
- Pretraining, finetuning, deploying and serving SOTA deep learning across various frameworks running on Instinct and other supported AMD GPUs.
- Running benchmarks on Instinct using ROCm
- Using advanced machine learning libraries like XGBoost and frameworks like JAX, DeepSpeed with ROCm, Instinct, and other supported AMD GPUs.
- Sharing AMD’s experience and best practices for porting and optimizing applications, libraries, and frameworks for AMD GPUs using software tools
- Providing useful references and guides for readers to quickly find the resources needed for their development work
- Sample codes provided for readers to further experiment with optimization strategies
- How to leverage third party tools and platforms such as profilers, serving platforms, parallel processing platforms
- Use cases from industry leaders on how they leverage ROCm and AMD GPUs to develop and deploy their services at scale.
As technology evolves, the AMD ROCm Blog Platform remains committed to staying at the forefront of innovation. With regular updates, insightful discussion, and a dedicated community, the blogging platform is poised to be the go-to destination for those passionate about exploring the vast potential of GPU computing with AMD's ROCm Software.
The ROCm blogs enable developers with Instinct GPUs to leverage AMD's latest AI innovations for real-world use cases – Providing guidance on implementing cutting-edge AI workloads, models and optimizations using AMD GPUs. The blogs share best practices for porting and optimizing scientific applications, training, fine-tuning and serving inferences on AMD hardware, empowering the community to replicate recent AMD research advances and apply them to impactful projects.
“ROCm’s journey began about a decade ago and I’ve been fortunate to see half of it. As I reflect on the journey past, it's important to look ahead. ROCm is at an inflection point of our story. With our new AMD InstinctTM MI300 series of GPUs unlocking the capabilities of our ROCm software, it is more important than ever to share our stories, knowledge, and achievements. Our new ROCm blogs website enables us to reach you while staying true.”
Saad Rahim, Product Manager ROCm
AMD Lab Notes Transition plan:
All existing and future blogs from the GPUOpen "AMD Lab Notes" series will fully transition over to the new ROCm blog website. The GPUOpen-based blog series kicked off at the end of 2022 and GPUOpen was intended as a temporary home for the hosting and rendering of git-based technical blogs until the ROCm Blogs website was developed. With the launch of this new git-based ROCm blogs webpage, the juxtaposition of the lab notes alongside other technical material presents a more unified ROCm scaling strategy. It also enables developers to easily search through and access all ROCm documentation from a more centralized location. To make the transition as seamless as possible, all existing AMD Lab Notes blogs on GPUOpen will eventually contain a redirect to their new homes by the end of 2H 2024.
You can access the AMD ROCm blog platform directly by visiting the AMD ROCm Developer Hub and selecting the Read Blogs button in the ROCm Blogs tile near the top of the page or going directly to the ROCm blog here. The new AMD ROCm blog page stands ready to provide you with the latest information as you embark the journey of discovery with AMD ROCm Software
The GitHub repository containing all ROCm blogs and any associated code examples can be found at https://github.com/ROCm/rocm-blogs. If you have any questions or comments, please reach out to us on GitHub Discussions. https://github.com/ROCm/rocm-blogs
Want to read more and start deploying optimized models?
Efficient image generation with Stable Diffusion models and AITemplate using AMD GPUs — ROCm Blogs
Fine-tune Llama 2 with LoRA: Customizing a large language model for question-answering — ROCm Blogs ...
Efficient deployment of large language models with Text Generation Inference on AMD GPUs — ROCm Blog...
Using LoRA for efficient fine-tuning: Fundamental principles — ROCm Blogs (amd.com)
LLM distributed supervised fine-tuning with JAX — ROCm Blogs (amd.com)Accelerating XGBoost with Dask using multiple AMD GPUs — ROCm Blogs
PyTorch Lightning on AMD GPUs — ROCm Blogs
Pre-training a large language model with Megatron-DeepSpeed on multiple AMD GPUs — ROCm Blogs
Pre-training BERT using Hugging Face & PyTorch on an AMD GPU — ROCm Blogs
Pre-training BERT using Hugging Face & TensorFlow on an AMD GPU — ROCm Blogs