Expanding the Versal Universe

amd_adaptivecomputing · ‎11-25-2022

This article was originally published on October 20, 2021.

Editor’s Note: This content is contributed by Manuel Uhm, Director, Silicon Marketing.

Xilinx hit another exciting milestone in April 2021 when we announced full production shipments for the lead Versal® AI Core and Versal Prime devices. But like our universe, the universe of all things Versal continues to expand at a rapid rate to include more:

Customer designs
Ecosystem partners, including 3rd party reference designs, IP, software, and OS support
Series, such as the recently launched Versal AI Edge and HBM series
Devices in multiple series
Hardened IP features
Benchmarks
Soft IP libraries
Software libraries

On that note, I want to take this opportunity to introduce you to the recent expansion of the Versal AI Core series.

New Features in the Versal AI Core Series

This most recent expansion brings two VC2xxx ACAPs to the Versal AI Core series; VC2802 and VC2602. If you assumed that the VC2xxxx devices must mean some significant new hard IP features are being added to the AI Core series, you are absolutely correct! As can be seen below in red, the VC2xxx series brings three new exciting features:

AIE-ML, the latest version of the AI Engines, which includes tightly coupled memory tiles for superior memory access and reduced latency
Integrated video decoder unit (VDU) with multiple Video Decoder Engines (VDEs)
PCIe® Gen5 support

AIE-ML and the differences with the AI Engines in the VC1xxx devices are well described in the AI Engine technology page on Xilinx.com.

The short story is that AIE-ML uses the same base architecture and tool flow as the AI Engines, but has been further optimized for machine learning applications through the addition of native support for INT4, BFLOAT16, double the local data memory per AI Engine at 64kB, and the new 512kB memory tiles, which are directly coupled to the AIE-ML array—meaning there is no need to use the adjoining programmable logic (PL) in the Adaptable Engines as a memory buffer. This translates to up to 4X more AI compute density at half the latency in AIE-ML compared to the AI Engines, and up to 4X greater performance/watt compared to GPUs! Each AIE-ML is roughly equivalent to 100 DSP58s, 2K LUTs, and 16 block RAMs of PL, and can save 33% in power over a PL-based implementation.

The VDU can support H.264 and H.265 at a single 4Kp60 stream or as many as thirty-two 720p15 streams and everything in between. If you were to implement this in the PL, you would need 120K LUTs, 50 DSP58s, and 3 block RAMs per unit! Using the hardened VDU results in 3.6W of power savings per VDU. This is ideal for many smart video applications where multiple video cameras can feed a central hub that is both decoding and executing advanced ML algorithms.

And, of course, PCIe Gen 5 adds support for the latest PCI Express standard, soon to be deployed in data centers to enable even more bandwidth and intelligence in the cloud. Adding hardened PCIe Gen 5 support has the tremendous benefit of saving 300K LUTs with a power savings of 3W per core.

Smart Applications

Naturally, all these new features in the VC2802 and VC2602 ACAPs must have a purpose! They enable the plethora of new “smart” applications being deployed at the edge and in data centers worldwide. One example is smart city applications, where multiple video cameras are being used to monitor vehicular or people traffic and use real-time ML algorithms to assess live traffic or perimeter security.

Another smart application is the retail sector for loss prevention where video cameras can be used in malls and stores to monitor for stolen goods or goods that have been mislabeled in real-time at the point of sale. Shrinkage is a huge problem in the retail sector, and smart retail applications can lead to dramatically lower shrinkage rates.

Getting Started

If you’re ready to jump into the Versal universe today, I would like to suggest two evaluation and prototyping platforms as a starting point; the VCK190 kit, the first Versal AI Core Series Evaluation Kit, and the SmartLynq+ module built for high-speed debug and trace, providing full visibility into the Versal architecture, including the AI Engines.

We also have an extensive list of documentation, examples, reference designs, resources, and methodologies to accelerate your development on the evaluation platforms. If you are new to Versal ACAPs, don’t worry! You can start with our Design Flow Assistant to start planning your development, and use our Design Process Hubs to easily identify all documents by design process. In addition, we have a vast number of Versal and Vitis open source examples and targeted reference designs on the Xilinx Github.

Stay Tuned

The addition of the VC2802 and VC2602 bring exciting new capabilities to the AI Core series, and there will be more exciting news in the future. So sign up for the Versal ACAP notification list and be the first to receive updates!