This article was originally published on October 21, 2020
Earlier today the MLPerf organization released its latest round of machine learning (ML) inference benchmark results. Launched in 2018, MLPerf is made up of an open-source community of over 23 submitting organizations with the mission to define a suite of standardized ML benchmarks. The group’s ML inference benchmarks provide a common and agreed-upon process for measuring how quickly and efficiently different types of accelerators and systems can execute trained neural networks.
This marked the first time Xilinx has directly participated in MLPerf. While there’s a level of gratification in just being in the game, we’re excited to have achieved a leadership result in an image classification category. We collaborated with Mipsology for our submissions in the more rigid “closed” division, where vendors receive pre-trained networks and pre-trained weights for true “apples-to-apples” testing.
The test system used our Alveo U250 accelerator card based on a domain-specific architecture (DSA) optimized by Mipsology. The benchmark measures how efficiently our Alveo-based custom DSA can execute image classification tasks based on the ResNet-50 benchmark with 5,011 image/second in offline mode. ResNet-50 measures image classification performance in images/seconds.
We achieved the highest performance/peak TOP/s (trillions of operations per second). It’s a measure of performance efficiency that essentially means, given an X amount of peak compute in hardware, we delivered the highest throughput performance.
The MLPerf results also showed that we achieved 100% of the available TOP/s compared to our published data sheet performance. This impressive result showcases how raw peak TOP/s on paper is not always the best indicator of real-world performance. Our device architectures deliver higher efficiencies (effective TOP/s versus Peak TOP/s) for AI applications. Most vendors on the market are only able to deliver a fraction of their peak TOPS, often maxing out at 40% efficient. More importantly, ML applications are more than just AI processing. They typically require pre- and post- ML processing functions that compete for system bandwidth and cause a system-level bottleneck. The power of our adaptable platforms is that they allow for whole application acceleration by also accelerating these critical non-AI functions and building application-level streaming pipelines to avoid system bottlenecks. Our leadership result was also achieved while maintaining TensorFlow and Pytorch framework programmability without requiring users’ have hardware expertise.
MLPerf is quickly becoming the industry’s de-facto standard for measuring ML performance. This was the 2nd version of the MLPerf inference benchmark (v0.7) and it attracted over 1200 peer-reviewed results. ML inference is a rapidly growing market for applications like autonomous driving and AI-based video surveillance that require computer vision tasks such as image classification and object detection. These complex compute workloads require different levels of throughput, latency, and power to run efficiently and this is where Xilinx and our adaptive computing products shine brightly.
The MLPerf benchmark results underscore the high-efficiency throughput and low latency performance our adaptive computing devices deliver for AI applications. We’re excited about these initial MLPerf results and look forward to participating in the next version.
For more information on the MLPerf Inference benchmark suite and version v0.7 results, please visit: https://mlperf.org/press#mlperf-inference-v0.7-results.