Today is a momentous day for the team behind AMD Radeon™ graphics, as we are proud to announce the official public availability of the AMD Catalyst™ 14.1 Beta driver, which features Mantle libraries to support the release of the Mantle-enabled patch for Battlefield 4™, along with the recent debut of the Mantle-based “Star Swarm” technical demo by Oxide Games. In addition, AMD Catalyst™ 14.1 delivers the promised "phase 2" frame pacing fixes for AMD CrossFire™ technology users!

 

A BRIEF PRIMER ON MANTLE

Mantle has been many years in the making by AMD, but we were not alone in this effort! Mantle was also directly shaped by the input we received from the greater game development community that has long sought a low-level graphics API for PCs.  We worked shoulder-to-shoulder with developers like DICE and Oxide Games to create Mantle in the image of their needs: a streamlined, robust, efficient API for modern graphics work. In fact, Mantle is the very first API designed directly by game developers for their modern craft!

 

At the simplest level, Mantle is an Application Programming Interface (API), or a language that game developers can use to write code that creates the beautiful graphics on your screen. In its current iteration, the Mantle API uniquely leverages the hardware in the Graphics Core Next architecture (GCN) of modern AMD Radeon™ GPUs for peak performance.

 

More broadly, Mantle is functionally similar to DirectX® and OpenGL, but Mantle is different in that it was purpose-built as a lower level API. By “lower level,” it’s meant that the language of Mantle more closely matches the way modern graphics architectures (like AMD’s own GCN) are designed to execute code. The primary benefit of a lower level API is a reduction in software bottlenecks, such as the time a GPU and CPU must spend translating/understanding/reorganizing code on-the-fly before it can be executed and presented to the user as graphics. Mantle comes in contrast to the “high level API,” which offers broader compatibility with multiple GPU architectures, but does so at the expense of lower performance and efficiency.

 

Before entering into performance analysis, however, we wanted to provide some new insight on the design goals of the Mantle API.

 

DESIGN PRINCIPLES OF MANTLE

First and foremost, Mantle is primarily designed to improve performance in scenarios where the CPU is the limiting factor (so-called “CPU-bound” cases); CPU-bound scenarios are legion in gaming, as existing APIs typically have heavy validation overhead, along with difficulty scaling out to multiple CPU cores. In addressing this common problem, Mantle can enable a pronounced improvement for the majority of global PC gamers that have entry-level and mid-range processors. Some of the techniques to achieve this include:

  • Low-overhead validation and processing of API commands
  • Explicit command buffer control
  • Close to linear performance scaling from recording command buffers onto multiple CPU cores
  • Reduced runtime shader compilation overhead

 

In turn, Mantle makes less of an impact in cases where high resolutions and “maximum detail” settings are used, as these settings are likely to be maximally taxing GPU resources in a manner that is more difficult to improve at the API level (so-called “GPU-bound” scenarios). While Mantle provides some built-in features to improve GPU-bound performance, gains in these cases are largely dependent on how well Mantle features and optimizations are being utilized by the developer. Some of those features include:

  • Reduction of command buffers submissions
  • Explicit control of resource compression, expands and synchronizations
  • Asynchronous DMA queue for data uploads independent from the graphics engine
  • Asynchronous compute queue for overlapping of compute and graphics workloads
  • Data formats optimizations via flexible buffer/image access
  • Advanced Anti-Aliasing features for MSAA/EQAA optimizations

 

It’s also prudent to note that Mantle is still in the beta phase and may not reflect the full performance we might be able to achieve through the optimization time we’ll be investing in the months ahead. And, as developers are still familiarizing themselves with Mantle and its relationship to Graphics Core Next, they may not have capitalized on all available opportunities for optimizations—but that will come with time.

 

One such optimization is the approach to multi-GPU performance scaling, which now rests in the hands of the game developer in the Mantle ecosystem. Developer control of multi-GPU performance empowers them to design an optimal multi-GPU codebase that perfectly matches the approach their rendering engine takes to graphics. Battlefield 4 is currently enabled with multi-GPU capabilities on Mantle, but the Oxide Games StarSwarm demo will be enabled with these capabilities in a later build.

 

MANTLE ON BATTLEFIELD 4

A game that needs no introduction, Battlefield 4 has captivated gamers with its intense multiplayer environments and, of course, the tremendous graphics courtesy of the game’s Frostbite 3™ engine. Augmented this week with Mantle, configuring Battlefield 4 to activate our API requires that you meet a few prerequisites:

Once these requirements have been met, you’re ready and Mantle is enabled! As for the performance uplift you might expect, the following diagrams are illustrative of the performance uplift you can expect from Mantle across a variety of processors. The data is demonstrative of what we’ve been promising since Mantle’s unveiling: a performance uplift across every scenario.

 

BF4_8350.png

BF4_7700K.png

BF4_4960X.png

BF4_4670K.png

 

MANTLE ON STAR SWARM

Star Swarm, meanwhile, is a technical demo developed by the incredible minds over at the new Oxide Games. Composed of industry veterans from Firaxis and Microsoft studios, the experienced artists and developers at Oxide have crafted the “Nitrous” engine to power a new generation of RTS titles, like the StarSwarm demo, with massive battlefields and a huge quantity of on-screen units—a perfect use case for Mantle!

 

Star Swarm is important because it shows the impact Mantle can make in scenarios with a high number of “draw calls,” which are instances where the GPU and the CPU must communicate to display any object you see on screen. Traditional game engines based on DirectX® typically hit a limit somewhere around 5000 draw calls (or lower, depending on the user’s CPU), demonstrating severe performance degradation beyond that point.

 

This performance degradation is due largely to the inability to efficiently utilize multiple CPU cores, which artificially limits the speed at which the CPU and GPU can communicate to do meaningful work. In contrast, the Star Swarm demo from the team at Oxide uses Mantle’s efficient multi-core scaling to raise that draw call limit into the neighborhood of up to 100,000 objects! With that in mind, tests performed on a range of CPUs demonstrate impressive performance improvements, even with significantly higher visual fidelity.


To test the Star Swarm demo from yourself, you’ll need to meet the following prerequisites:

 

Oxide_RTS_7700K.png

Oxide_RTS_4670K.png

Oxide_RTS_4960X.png

 

 

UPDATED FRAME PACING

Frame pacing is a technology that allows a multi-GPU configuration to deliver each frame of your game at consistent intervals. Even duration between each frame in your overall frame rate imparts a certain buttery smoothness that’s hard to describe in words, but luxurious to experience in-action in your favorite game.

 

Last year, we developed and released our own implementation of this technology for customers of AMD Radeon™ products in the AMD Catalyst™ 13.8 Beta driver. In that update, we enabled frame pacing for Graphics Core Next-based GPUs on resolutions up to 2560x1600 in DirectX® 10 and 11 applications. With that release, we also promised a “phase 2” driver that would address additional configurations, and we’re pleased to say that today is the day: AMD Catalyst™ 14.1 Beta is the promised “phase 2” driver!

 

AMD Catalyst™ 14.1 Beta brings new frame pacing support to the following AMD CrossFire™ or AMD Dual Graphics configurations when running DirectX® 10 and DirectX® 11 applications at resolutions higher than 2560x1600:

  • AMD Radeon™ R9 280X Graphics
  • AMD Radeon™ R9 270X Graphics
  • AMD Radeon™ R9 270 Graphics
  • AMD Radeon™ HD 7000 Series
  • AMD Radeon™ HD 7000M Series
  • AMD Radeon™ HD 8000 Series
  • AMD Radeon™ HD 8000M Series
  • These AMD Dual Graphics configurations
  • AMD Dual Graphics configurations using the new “Kaveri” APU with the AMD Radeon™ R7 240 or R7 250 GPUs

 

As a visual example of the benefits provided by frame pacing, we can plot out a real-world gaming scenario. The following diagram represents the time, in milliseconds, it took to render and display each frame to the user. To interpret the graph, look at the peaks and valleys: the line presenting fewer periods of frequent peaks and valleys, particularly fewer dramatic spikes, is representative of a smoother gaming experience.

 

The below image illustrates an AMD Dual Graphics configuration: pairing the new “Kaveri” APU and the AMD Radeon™ R7 250 to run Tomb Raider™. As you can see, the time to present a frame is noticeably improved with the AMD Catalyst™ 14.1 graphics driver (red line), which is consistently and significantly less erratic than its predecessor (blue).

 

tr_dgfx_frametimes.png

 

 

THE ROAD AHEAD

Throughout the months that have followed our October 2013 unveiling of Mantle, you have been patient and kind to us as the Mantle consortium labored to make the first release the best it could possibly be. Concurrently, your enthusiastic support on Facebook, Twitter and at trade shows has been real and personal encouragement for every person working on the API. While we can never truly repay your kindness with a piece of software, we hope that it goes into the world with no uncertain amount of gratitude for us. We thank you so very deeply for your support, and vow that we will bring support to the full breadth of Graphics Core Next GPUs in the months ahead.

 

More broadly, AMD Catalyst™ 14.1 can be a significant improvement for gamers with systems based on AMD CrossFire™ or Dual Graphics technologies, with the potential for vast enhancements to the overall fluidity of their DirectX® 10 and 11 gaming experiences.

 

As this is a beta driver, we’d like to sign off by noting that your feedback is vital to the future of the AMD Catalyst™ driver. Please report any bugs or issues to our official reporting form!

 

SUPPORTING RESOURCES

 

Robert Hallock is PR Manager for Gaming & Desktop Graphics at AMD.  His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only.  Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.


SYSTEM CONFIGURATIONS FOR BATTLEFIELD 4™ and STARSWARM:

  • Intel Core i7-4960X System: Intel X79 Chipset, 16GB DDR3-1600 RAM, Windows 7 x64, AMD Catalyst™ 14.1 graphics driver
  • Intel Core i5-4670K System: Intel z87 Chipset, 16GB DDR3-1600 RAM, Windows 7 x64, AMD Catalyst™ 14.1 graphics driver
  • AMD FX 8350 System: AMD 990FX chipset, 16GB DDR3-1600 RAM, Windows 7 x64, AMD Catalyst™ 14.1 graphics driver
  • AMD A10-7700K System: AMD A88X chipset, 16GB DDR3-1600 RAM, Windows 7 x64, AMD Catalyst™ 14.1 graphics driver


SYSTEM CONFIGURATION FOR TOMB RAIDER™ FRAME PACING:

  • AMD A10-7850K, AMD Radeon™ R7 250, 16GB DDR3-1600 RAM, Windows 7 x64, AMD Catalyst™ 14.1 graphics driver, Resolution: 1920x1080, Preset: High Quality