Showing results for 
Search instead for 
Did you mean: 

Radeon™ GPUs win big in DirectX® 12 with async compute

4 3 67.1K

Last week Ashes of the Singularity™ was updated with comprehensive support for DirectX® 12 Asynchronous Compute. This momentous occasion not only demonstrated how fast Radeon™ GPUs are in DirectX® 12 games, but how much “free” performance can be gained with our exclusive support for asynchronous compute.

A Brief Primer on Async Compute

Important in-game effects like shadowing, lighting, artificial intelligence, physics and lens effects often require multiple stages of computation before determining what is rendered onto the screen by a GPU’s graphics hardware.

In the past, these steps had to happen sequentially. Step by step, the graphics card would follow the API’s process of rendering something from start to finish, and any delay in an early stage would send a ripple of delays through future stages. These delays in the pipeline are called “bubbles,” and they represent a brief moment in time when some hardware in the GPU is paused to wait for instructions.

thread.PNGA visual representation of DirectX® 11 threading: graphics, memory and compute operations are serialized into one long production line that is prone to delays.

Pipeline bubbles happen all the time on every graphics card. No game can perfectly utilize all the performance or hardware a GPU has to offer, and no game can consistently avoid creating bubbles when the user abruptly decides to do something different in the game world.

What sets Radeon™ GPUs apart from its competitors, however, is the Graphics Core Next architecture’s ability to pull in useful compute work from the game engine to fill these bubbles. For example: if there’s a rendering bubble while rendering complex lighting, Radeon™ GPUs can fill in the blank with computing the behavior of AI instead. Radeon™ graphics cards don’t need to follow the step-by-step process of the past or its competitors, and can do this work together—or concurrently—to keep things moving.

A visual representation of DirectX® 12 asynchronous compute: graphics, memory and compute operations decoupled into independent queues of work that can run in parallel.

Filling these bubbles improves GPU utilization, input latency, efficiency and performance for the user by minimizing or eliminating the ripple of delays that could stall other graphics cards. Only Radeon™ graphics currently support this crucial capability in DirectX® 12 and VR.

Ashes of the Singularity™: Async Compute in Action


AMD Internal testing. System config: Core i7-5960X, Gigabyte X99-UD4, 16GB DDR4-2666 Radeon™ Software 15.301.160205a, NVIDIA 361.75 WHQL, Windows® 10 x64.

Here we see that the Radeon™ R9 Fury X GPU is far and away the fastest DirectX® 12-ready GPU in this test. Moreover, we see such powerful DirectX® 12 performance from the GCN architecture that a $400 Radeon™ R9 390X GPU ties it up with the $650 GeForce GTX 980 Ti.1 Up and down the product portfolios we tested, Radeon™ GPUs not only win against their equivalent competitors they often punch well above their pricepoints.

You don’t have to take our word for it. Tom’s Hardware recently explored the performance implications of DirectX® 12 Asynchronous Compute, and independently verified the commanding performance wins handed down by Radeon™ graphics.

“AMD is the clear winner with its current graphics cards. Real parallelization and asynchronous task execution are just better than splitting up the tasks via a software-based solution,” author Igor Wallossek wrote.

Other interesting data emerged from the THG analysis, summarized briefly:

  • The Radeon™ R9 Fury X gets 12% faster at 4K with DirectX® 12 Asynchronous Compute. The GeForce 980 Ti gets 5.6% slower when attempting to use this powerful DirectX® 12 feature.
  • DirectX® 12 CPU overhead with the Radeon™ R9 Fury X GPU is an average of 13% lower than the GeForce 980 Ti.
  • The Radeon™ R9 Fury X GPU is a crushing 98% more efficient than the GeForce 980 Ti at offloading work from the CPU to alleviate CPU performance bottlenecks. At 1440p, for example, THG found that the Fury X spent just 1.6% of the time waiting on the processor, whereas the 980 Ti struggled 82.1% of the time.

Of asynchronous compute, Wallossek later concludes: “This is a pretty benchmark that serves up interesting results and compels us to wonder what's coming to PC gaming in the near future? One thing we can say is that AMD wins this round. Its R&D team, which implemented functionality that nobody really paid attention to until now, should be commended.”

We couldn't have said it better ourselves.

Robert Hallock is the Head of Global Technical Marketing at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.


1. Prices in $USD based on as of February 29, 2016. Happy leap day!

Tags (2)