Introducing the First AMD 1B Language Models: AMD OLMo

AMD_AI · ‎11-04-2024

Core Contributors: Jiang Liu, Jialian Wu, Prakamya Mishra, Zicheng Liu
Contributors: Sudhanshu Ranjan, Pratik Prabhanjan Brahma, Yusheng Su, Gowtham Ramesh, Peng Sun, Zhe Li, Dong Li, Lu Tian, Emad Barsoum

Introduction

In recent years, the rapid development of artificial intelligence technology, especially the progress in large language models (LLMs), has garnered significant attention and discussion. From the emergence of ChatGPT to subsequent models like GPT-4 and Llama, these language models have demonstrated remarkable capabilities in natural language processing, generation, understanding and reasoning. Continuing AMD tradition of open-sourcing models and code to help the community advance together, we are excited to release our first series of fully open 1 billion parameter language models, AMD OLMo.

Why Build Your Own Language Models

The ability to pre-train and fine-tune your own LLM helps towards the incorporation of domain-specific knowledge, ensuring better alignment with unique use cases. This approach allows organizations to tailor the model’s architecture and training process to meet their unique requirements, achieving a balance between scalability and specialization that off-the-shelf models may not provide. As the demand for customized AI solutions continues to grow, the ability to pre-train LLMs unlocks unprecedented opportunities for innovation and product differentiation across industries. Aligned with the goal of advancing accessible AI research, AMD has open-sourced its complete training details and released the checkpoints for the first series of AMD OLMo models. This initiative empowers a diverse community of users, developers, and researchers to explore, utilize, and train state-of-the-art large language models. By demonstrating the capabilities of AMD Instinct™ GPUs in demanding AI workloads, AMD aims to highlight its potential for running large-scale multi-node LM training jobs with trillions of tokens to achieving improved reasoning and instruction-following performance compared to other fully open similar size LMs. In addition, the community can run such models on AMD Ryzen ™ AI PCs that are equipped with Neural Processing Units (NPUs) utilizing AMD Ryzen AI software to enable easier local access without privacy concerns, efficient AI inference, and lower power consumption.

Unveiling AMD OLMo Language Models

AMD OLMo are a series of 1 billion parameter language models pre-trained with 1.3 trillion tokens on 16 nodes, each with four (4) AMD Instinct MI250 GPUs. Along with complete details to reproduce, we are releasing three (3) checkpoints corresponding to the various stages of training:

AMD OLMo 1B: Pre-trained on a subset of Dolma v1.7 that consists of 1.3 trillion tokens.
AMD OLMo 1B SFT: Supervised fine-tuned (SFT) on Tulu V2 dataset (1st phase) and then OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets (2nd phase).
AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset.

AMD OLMo 1B is based on the model architecture and training set up of fully open source 1 billion version of OLMo, with some key differences. We pre-train with less than half the tokens used for OLMo-1B (effectively cutting the compute budget by half while maintaining comparable performance) and execute post-training comprising of a two-phase SFT and DPO alignment to enhance performance in general reasoning, instruction-following and chat capabilities (OLMo-1B does not carry-out any post-training steps). For the two-phase SFT, we create a data mix of high quality and diverse instructional datasets that are publicly available. Overall, our training recipe helps to produce a series of models that achieve better performance over various types of benchmarks as compared to other similar sized fully open-source models trained on publicly available data.

Results

We compare AMD OLMo models with other similarly sized fully open-source models that have publicly released their data, model weights and training code. The pre-trained baseline models that we used for comparison include: TinyLLaMA-v1.1 (1.1B), MobiLLaMA-1B (1.2B), OLMo-1B-hf (1.2B), OLMo-1B-0724-hf (1.2B), and OpenELM-1_1B (1.1B).

Using an end-to-end training pipeline running on AMD Instinct GPUs that consists of a pre-training stage with 1.3 trillion tokens (which is half the pre-training compute budget as compared to OLMo-1B), a two-phase supervised fine-tuning stage, and DPO based human preference alignment stage, AMD OLMo models are comparable to or outperform the other similar sized fully open models across general reasoning and chat capabilities, while performing at par on responsible AI benchmarks. The language model was also deployed onto AMD Ryzen AI PCs that can potentially help enable a diverse set of edge use cases. Open sourcing the data, weights, training recipes and code is primarily aimed at helping developers to reproduce as well as innovate further on top. AMD remains committed to providing the open-source community with a steady stream of new AI models and eagerly anticipates the innovations that will emerge from their collaborative efforts.

To dive deeper into the three stages of training, and AMD OLMo model results, please reference the full article here: Introducing the First AMD 1B Language Models: AMD OLMo Fuels AI Advancements