This article was originally published on February 21, 2022
Editor's Note: This content is contributed by Rehan Tahir, Sr. Product LIne Manger for Versal AI Edge ACAPs at AMD
Here at Xilinx, now AMD, we’re really excited about AI Engine technology in Versal® ACAPs (if you couldn’t already tell) given their importance in delivering high-performance adaptive computing in the many markets that the combined Xilinx + AMD now serve. The Versal AI Core and AI Edge series, with this AI Engine technology, are perfectly suited for use cases where compute acceleration is a key requirement.
AI Engines are truly versatile in their ability to deliver performance improvement for Machine Learning (ML) as well as Signal Processing applications. We have talked in depth about the 4X AI performance per watt that the AI Engines for Machine Learning (AIE-ML) deliver vs. GPUs. But today, we’re going to focus on the signal processing side of things, with an emphasis on the Fast Fourier Transform (FFT).
Unless you’re a DSP specialist, you may vaguely recall FFTs from your Intro to Signal Processing class from undergrad. Luckily, you won’t need to dust off your textbook to implement FFTs on AI Engines because the software does the hard work for you (more on that later). However, it’s important to remember that the FFT is one of the most important algorithms ever developed, enabling nearly all the image & audio compression and digital communication that we use in our daily lives.
Narrowing things down a bit further, let’s review how FFTs can be used in applications like LiDAR, light detection and ranging, and radar, radio detection and ranging. A LiDAR or radar emitter transmits a sinusoidal signal with a frequency that varies over time. That signal bounces off objects and reflects to a receiver. By analyzing the characteristics of that received waveform, an object’s position and relative velocity can be calculated. This is done by using an FFT to look at the spectral content of the reflected signal. The amplitude of the signal is proportional to the time when the object was detected, and the frequency of the signal is proportional to its distance. Through this conversion from frequency domain to time domain, a series of signals can generate meaningful representations of a certain location, like 2D or 3D point clouds or 4D radar, to generate images. For automotive applications, aggregate FFT throughput is relatively low – requiring less than 1 giga-sample-per-second (GSPS). For certain radar or communication applications, you may need a sample rate as high as 10–15 GSPS.
Now that we’ve talked about the high-level functionality of an FFT, we’ll shift to the benefits of implementing FFTs with AI Engines.
The point cloud or 4D radar that you generated using FFTs can be fed back into the AI Engines to perform CNN ML functions. Vitis™ AI already offers models in our Model Zoo to implement bounding boxes with PointPillars for object detection and/or SalsaNext for semantic segmentation. This is made possible because the resources required to perform the FFT are minimal – leaving most AI Engines available to accelerate your application with ML.
AI Engines are much more effective than using programmable logic from a power perspective. The power reduction is roughly 50%, resulting in a 2X improvement in performance per watt.
The AI Engine array architecture allows for memory sharing, which results in a higher number of FFT computations. This is highlighted in the table below where three Zynq® UltraScale+™ MPSoC ZU3 devices are needed to compute a 64 channel 2K x 1K 2D FFT. This can be performed on a single Versal AI Edge VE2102 device.
Our GitHub site has a wonderful tutorial that goes through the process of implementing a 2D FFT on the VCK190 Evaluation Kit. The tutorial walks you through the design process while going into the details behind the hardware and software implementation. The key to the simplicity of this tutorial is the FFT kernel – available within the Vitis DSP Library. Performance details are available to help you understand the resource utilization, latency, throughput, and power for each design variation. These design resources will also be available to implement FFTs on the Versal AI Edge series.
Join the early access program if you want to learn more about AI performance with Versal AI Edge series. Contact your local Xilinx representative orcontact salesto get started.