Showing results for 
Search instead for 
Did you mean: 

Asynchronous Shaders Evolved

3 2 23.4K

One of the most exciting new developments in GPU technology over the past year has been the adoption of asynchronous shaders, which can make more efficient use of available hardware to extract more performance. This capability was introduced with AMD’s Graphics Core Next (GCN) GPUs, and made accessible to developers with the latest generation of graphics programming interfaces including DirectX® 12 and Vulkan™.

For more detail on how asynchronous shaders work, you can check out the white paper here. As a quick recap, the idea behind asynchronous shaders is to allow the GPU to handle both graphics and compute tasks concurrently without having to switch back and forth between them. This allows small compute jobs to use resources that might otherwise sit idle when the main rendering workload is waiting for something else to happen, like completing a data transfer or receiving a new command from the CPU.

Modern rendering engines must execute a large number of individual tasks to generate each visible frame. Each task includes a shader program that runs on the GPU. Normally these tasks are processed sequentially in a fixed order, which is referred to as synchronous execution. Asynchronous shader technology allows more flexibility in terms of the timing and order of execution for independent tasks. When used effectively, the result is better utilization of the GPU, faster frame times, and improved responsiveness.

While the feature is already being employed in games like Ashes of the Singularity and Hitman, there is much more to come.  Developers are just starting to experiment with the basic functionality, and the new wave of virtual reality applications starting to appear this year are poised to make great use of it. Meanwhile at AMD we have been working on enhancing the technology with the goal of making it even more powerful.

Quick Response Queue

Today’s graphics renderers provide many opportunities to take advantage of asynchronous processing, but for some applications the lack of determinism in terms of when certain tasks are executed could diminish the benefits. In these cases the renderer needs to know that a given task will be able to start and complete within a certain time frame.

In order to meet this requirement, time-critical tasks must be given higher priority access to processing resources than other tasks. One way to accomplish this is using pre-emption, which works by temporarily suspending other tasks until a designated task can be completed. However the effectiveness of this approach depends on when and how quickly an in-process task can be interrupted; task switching overhead or other delays can impact responsiveness, and potentially manifest as stuttering or lag in graphics applications.

To address this problem, we have introduced the idea of a quick response queue. Tasks submitted into this special queue get preferential access to GPU resources while running asynchronously, so they can overlap with other workloads. Because the Asynchronous Compute Engines in the GCN architecture are programmable and can manage resource scheduling in hardware, this feature can be enabled on existing GPUs (2nd generation GCN or later) with a driver update.


Illustration comparing different methods of scheduling graphics and compute workloads on a GPU

Enabling Asynchronous Time Warp for Virtual Reality

Virtual reality rendering provides a great use case for the quick response queue. For example, the production release of the Oculus Rift VR headset implements a technique known as Asynchronous Time Warp (ATW) to reduce latency and prevent image judder caused by dropped frames.

In VR, dropped frames can occur when a frame takes too long to render and misses a refresh of the head-mounted display, causing the same image to be displayed repeatedly. The effect is jarring and destroys the sense of presence that is essential to VR. While there are a variety of ways to address this problem (including application tuning, reducing image quality, or upgrading to a more powerful graphics card), Oculus’ ATW solution is designed to be automatic and transparent to users as well as to developers.

ATW works by performing an image warp on the last frame that has finished rendering, to correct for any head movement that takes place after the rendering work is initiated. This warping operation is executed on the GPU using a compute shader, and can be scheduled asynchronously with other rendering tasks on hardware that supports that capability. Scheduling this operation every frame ensures that there is always an updated image available to display, even if it is only a warped version of a previously displayed frame.

While great in concept, execution of the ATW task must be timed carefully in order to be useful. Ideally it should happen as late as possible in a frame interval, allowing just enough time for it to complete before the next display refresh. If it happens too early, then additional head movement can occur before the display refresh, causing a noticeable lag. If it happens too late, then it may miss the refresh and allow visible juddering to occur.

This is where the quick response queue comes into play. Putting the ATW shader on this queue gives it priority access to the GPU’s compute units, making it far more likely to complete before the next refresh even when it is submitted late in each frame interval. And since it doesn’t need to pre-empt other graphics tasks already in flight, it allows the GPU to start working on the next frame quickly.


Timeline showing how Asynchronous Time Warp tasks are scheduled concurrently with graphics tasks

This is just one example of how providing more precise control over when individual tasks execute on GPUs can open the door to entirely new ways of exploiting the massive computational power they offer. We are already experimenting with other latency-sensitive applications that can take advantage of this, such as high fidelity positional audio rendering of virtual environments on the GPU. We’re also looking at providing more scheduling controls for asynchronous compute tasks in the future. And we can’t wait to see what developers do with this next!

P.S.  If you haven’t already, install the latest Radeon Software drivers to make sure you have access to all of the latest features and optimizations for your Radeon™ GPU.

David Nalasco is the Senior Technology Manager for Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.