Partner: Super-Resolution: Upgrading Image Quality with AI

amd_adaptivecomputing · ‎11-17-2022

This article was originally published on July 1, 2020.

Editor’s Note: This content is contributed by Robert Lara, Senior Marketing Director at Mipsology.

Leveraging Super-Resolution to Improve Video and Image Quality

New content offerings generally meet the HD standard, but this doesn’t always apply to older TV shows and movies, nor to user-generated videos posted on social media. Thankfully, there is a solution. Advanced deep learning models are now able to perform “super-resolution,” a method of improving video images that identifies the attributes of the low-quality video or image and ‘fills in’ the missing parts to create a higher quality video or image output. It’s not the real original image, but it looks more natural to the human eye. With super-resolution, a streaming service can take old content like “The Twilight Zone” and make the video quality look as if it were shot in the 21st century. And people who don’t like black and white footage are in luck too; machine learning and neural networks will likely be used to add color to old footage someday soon. More on that in a future blog post.

A Strain on the Computing Resources

The largest streaming services and social media applications would ideally offer millions of videos at the highest quality resolution to optimize the viewing experience, but this is neither quick nor easy. Applying super-resolution to one hour of video can take 10-15 hours and requires significant computing resources. Add to this the growing demand for quality live-streaming content through services like Twitch and Zoom, which requires them to create millions of high-resolution streams without delays at optimum performance, 24/7, and compatible with any screen size - phone, tablet, or TV.

This is where Mipsology’s Zebra software solution can play a significant role for service providers looking to differentiate with high-quality video content. See the Zebra software stack image below.

Super-Resolution: The Zebra difference

Nowadays, deep learning techniques have been applied to many images or video-related tasks. It has also been proven to be effective for Super Resolution, which shows state-of-the-art performance in terms of image quality. However, neural networks for super-resolution differ from standard classification or segmentation networks in that they have massive inputs and outputs and require a huge number of calculations. Zebra leverages the high density of memories coupled to the large computing resources in FPGA to deliver an ideal computing platform for all NNs, including those as demanding as super-resolution.

A very good neural network for creating such high-resolution images is EDSR (https://arxiv.org/abs/1707.02921), which structure looks like:

The architecture of the proposed sing-scale SR network (EDSR) The architecture of the proposed sing-scale SR network (EDSR)Zebra streamlines the process of super-resolution and eases the computing load, enabling content and streaming providers to achieve their high-quality video and image goals. Using multiple Xilinx Alveo™ Accelerator cards in a computer, Zebra makes it possible to achieve a high density of computing, reducing the cost of infrastructure: 1 Xilinx Alveo-enabled server does the job of 3 GPU-enabled servers. Based on 8-bit integer computing and a proprietary efficient quantization, Zebra accelerates the inference of neural networks like EDSR to create high-quality 2K or 4K content from 1K video and enable live streaming – all on a single computer. Not only does that reduce the initial cost, but it also reduces the installation costs, data center costs, and greatly simplifies the software as the video streams don’t need to be spread over multiple hosts.

A Zebra result of running EDSR is displayed below using the “0825 The band” image (from DIV2K dataset, NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study).
The image before processing is displayed below (top left) and after processing (bottom left). As a 4k image precision is not easy to show in an article, we have zoomed the same area of the image. The top right shows a zoom on the original image using a classical bicubic algorithm, while the bottom right displays a real piece of the resulting 4k image after it was enlarged by Zebra using EDSR.

Conclusion

By using Zebra enabled Xilinx FPGA-based platforms, Zebra enables a simple processing infrastructure that reduces the cost of creating high-definition content, compared to competitive hardware. FPGA-based hardware has a long lifespan and does not randomly fail, enabling 24/7 services to run with low maintenance costs and no interruption. This is essential for companies that are looking to upgrade thousands of movies, shows, and short videos.

Zebra’s high-performance AI acceleration engine is plug & play, does not require any changes to the neural network, and can be immediately deployed for inference while keeping existing training. This is important for two reasons. First, it saves an incredible amount of time and cost, and second, Zebra’s unique IP delivers high-quality content which is required in today’s commercial level super-resolution applications.