AMD and ONNX release Open-Source Toolchain, TurnkeyML, for agile model development and deployment.
Authors: Ramakrishna Sivakumar (AMD, ONNX Model Zoo SIG Chair), Jeremy Fowers (AMD), Daniel Holanda Noronha (AMD), Victoria Godsoe (AMD), Prasanth Pulavarthi (Microsoft, ONNX Steering Committee Member)
The landscape of machine learning and artificial intelligence is one characterized by rapid and continual innovation, especially as we see an explosion of Generative AI models ranging from language generation to image synthesis. With this ever-evolving industry, there is a constant influx of new models that push the boundaries of what is possible. While exciting, this poses a significant challenge for repositories of models to stay relevant. The primary issue lies in the maintenance and ingestion of these models. As the field advances, models that were once state-of-the-art can quickly become obsolete, struggling to keep up with newer, more efficient architectures. This challenge is compounded by the diverse and ever-expanding combination of software stacks, hardware backends, and model precisions, each adding layers of complexity to the process of model management.
What the ONNX ecosystem realized, is the need for a more dynamic approach at automating the continual integration of these new, cutting-edge model architectures. This is crucial in enabling the community to leverage the full potential of these advancements, providing for easy access to the latest models optimized for a variety of platforms and applications.
Introducing TurnkeyML:
To address these challenges, AMD, in collaboration with the ONNX community, is excited to introduce TurnkeyML, an open-source toolchain designed to improve the way we handle AI models for inferencing. TurnkeyML is a comprehensive framework that streamlines the process of ingesting any open-source PyTorch model, optimizing them, and executing them across a diverse set of hardware targets. And it does all this in a way that's completely transparent to the user.
The strength of TurnkeyML is in its ability to automate the import and pre-processing of open-source AI models, significantly reducing the time and effort required to integrate them in to ONNX Model repo. Additionally, the tool provides the ability for the user to customize the settings for processing the model such as target OpSet version, data type precision, and model optimizations.
TurnkeyML Architecture:
TurnkeyML is accessible via a Command Line Interface (CLI) or an Application Programming Interface (API), catering to diverse user preferences and automation requirements. Its architecture is composed of several key components, each serving a distinct function in the model deployment and optimization pipeline:
Sequences: At the foundational level of the TurnkeyML stack, a sequence is responsible for model-to-model transformations. Examples of a sequence include tasks like exporting a PyTorch model to the ONNX format using torch.onnx.export, applying model quantization techniques, or performing graph optimizations using tools like ONNX Runtime to enhance model performance.
Runtimes: Serving as the execution layer, a runtime refers to the software that runs a given model on specific hardware. Example runtimes include high-performance frameworks like TensorRT™ for NVIDIA GPUs, or ONNX Runtime, which provides cross-platform support and can work with different backend hardware, such as AMD ROCm™ execution provider for AMD GPUs.
Profilers: To provide insight into the efficiency of your model execution, TurnkeyML integrates with various profiling tools that allow users to analyze and optimize model performance. Users can select from a suite of profilers such as Nvidia's System Management Interface (SMI) for GPU-based metrics or Intel's VTune Profiler for in-depth CPU profiling and performance analysis.
Devices: TurnkeyML is designed to be device-agnostic, targeting a wide range of hardware platforms. Whether it's executing models on x86 CPUs, leveraging the parallel computation power of GPUs, or deploying to custom accelerators, TurnkeyML provides flexibility and adaptability in hardware selection. TurnkeyML can be easily updated to add additional hardware targets.
Reporting: The final component is TurnkeyML's reporting infrastructure, which consolidates and presents performance statistics, such as mean latency (ms) and throughput (IPS), from the model execution process. This reporting is vital for visualizing performance data, providing insights into efficiency, and helping users make informed decisions about model deployments.
TurnkeyML is designed to be a modular, plug-and-play framework, where each component can be easily added, removed, or replaced without complex integration processes. This modular architecture allows developers to quickly adapt the toolchain to their specific needs, whether it involves integrating new models, adding new model-to-model transformations, or targeting new hardware accelerators.
TurnkeyML's systematic reproducibility offers key benefits for AI development:
- Designed to Reduce User Error: Automating workflows minimizes manual errors, enhancing consistency and reliability in model handling.
- Standardizes Model Sources: It ensures uniformity in how models are ingested and processed, facilitating easier comparison and evaluation.
- Uniform Test Conditions: By standardizing testing environments, TurnkeyML allows for fair and accurate benchmarking across different models.
“We are very excited to see AMD making significant contributions to the ONNX community. ONNX is one of the key components of a production AI stack for thousands of organizations, including at Microsoft where ONNX Runtime powers our AI services. The new open-source Turnkey system developed and contributed by AMD can greatly simplify the process of getting PyTorch models ready for optimized production deployment and we’re excited to see its impact on the community.”
‒ Prasanth Pulavarthi (Microsoft), ONNX Steering Committee Member
Conclusion:
Recognizing the diverse set of challenges that are introduced with the rapid release of SOTA models, the ONNX ecosystem has moved to an automated approach for the continual integration of cutting-edge model architectures. This shift is not just about scaling up the ONNX Model repo; it’s about ensuring that it remains a valuable resource for developers and researchers. By automating the process of updating and validating models, the repo reflects the latest advancements in AI while maintaining a standard of reliability and performance. This step is crucial in enabling the community to leverage the full potential of these advancements, ensuring easy access to the latest models optimized for a variety of platforms and applications. By providing access to ready-made SOTA models, that can easily be run across a variety of end hardware. Highly extensible, framework that scales and is vendor agnostic, how this supports the open-source community.
Call to Action:
Visit the GitHub repository https://github.com/onnx/turnkeyml for an in-depth look at TurnkeyML, complete with detailed instructions and user guides, the full framework, and access to ONNX-ready open-source models at https://github.com/onnx/models. Your technical insights, contributions, and ideas are highly valued. Feel free to connect with our development team via amd_ai_mkt@amd.com for any technical inquiries.