1 2 Previous Next

AMD Gaming

24 Posts authored by: rhallock

Great computing experiences don’t just happen—they’re AMD-enabled.

 

The latest major release of our drivers are available as a free upgrade for AMD customers. In addition to releasing new versions of the system software at regular intervals, AMD today released an AMD Catalyst™ Omega special edition software update that will include enhancements to enrich the user experience.

 

Why? Today’s hardware and software have become highly interconnected and interdependent dynamically interacting to shape a cohesive computing unit. This symbiotic relationship between hardware and software is vital to the ongoing evolution of future computing devices. New software becomes incorporated into an existing generation of hardware, enabling faster, more capable, and more reliable performance.

 

WHAT IS THE AMD CATALYST™ OMEGA DRIVER?

Last year alone, AMD Catalyst™ drivers were downloaded more than 80 million times — and we are thrilled that millions of customers are enjoying the benefits of our new software. Giving them something extra-special this time of year is the best way to thank them for their continuing support, and show our appreciation for being part of our AMD community.

 

Our software team has worked hard to enrich the user experience, and create a remarkable environment for developers by providing them the ability to create incredible new apps. The AMD Catalyst Omega driver was engineered to take full advantage of the advanced technologies built into AMD’s products that feature GCN Architecture, and help make them more powerful and capable.

Capture.PNG.png

Extra performance — no extra cost

Think of the last time a product you purchased actually improved over time. AMD Radeon™ graphics and AMD A-series APUs featuring GCN architecture can get  easy software upgrades that boost performance, enhance reliability, and help reduce heat and energy consumption. Installing the AMD Catalyst Omega driver on select AMD products enables free software upgrades that install automatically and can improve your gaming performance.

 

For example, early buyers of an AMD Radeon™ R9 290X GPU who download and install the AMD Catalyst Omega driver can realize up to 19% faster* gameplay on BioShock Infinite.

 

Similarly, users of AMD’s advanced APUs like the AMD A10 7850K can achieve up to 29% faster** gaming performance on Batman: Arkham Origins.

 

Great software brings out the best of great hardware

The AMD Catalyst Omega driver extracts the true potential of GCN-enabled AMD APUs and GPUs. Here are a few examples of new AMD Catalyst Omega driver capabilities:

  • Enabling the UltraHD revolution: UltraHD TVs and monitors are now available, and becoming much more affordable. UltraHD displays demand UltraHD content—but very little content or entertainment is being recorded in 4K at this time. The good news: we are offering built-in Ultra HD upsampling with frame rate conversion and HD detail enhancement that will convert 1080p videos to near UltraHD quality on 4K displays.

  • Perfect Picture UltraHD: Our Perfect Picture UltraHD technology strives for “pixel-perfect” images, with Compression Artifact Removal 2, and Frame Rate Conversion for Blu-ray Playback enabling pixel-by-pixel image processing.

  • What is better than even more powerful? Smoother: There are many reasons that make AMD APUs and AMD GPUs a match made in heaven. But one of the major ones is that one brings out the best in the other. When select products are paired together through Dual Graphics with frame pacing enhancements, the powerful gameplay becomes smooth.

 

Here are examples of AMD enabling developers to deliver outstanding user experiences:

 

  • OpenCL™ 2.0 Support: Enabling developers to extend the reach of their app content and functionality based on industry standards.

  • TressFX Hair 3.0: Introducing new gaming capabilities for game developers with TressFX, such as rendering of fur onto “skinned” geometries.

  • CodeXL Tools: A comprehensive tools suite for the performance-aware developer to Debug, Profile and Analyze applications. Also included is a realtime display of APU power consumption that collects data on power consumption, core frequency, temperature changes and voltage and current levels.

 

Testing quantity delivers exceptional product quality

The benefits of “quality vs. quantity” are frequently debated — except when it comes to delivering an exceptional user experience, where the quality of a product heavily depends on the quantity of product testing. This is why every AMD Catalyst™ driver release undergoes exhaustive testing to uncover and fix hidden flaws and make the user experience as intuitive, reliable, and enjoyable as possible.

 

Testing the AMD Catalyst Omega driver required executing around 65% more automated and 10% more manual test-cases, utilizing 10% more varied system configurations, with 10% more different display makes and models.*** However, we did not stop there.

 

Our community managers asked six of the largest PC communities to share their candid feedback about our AMD Catalyst™ drivers, and report on the issues they discovered. Our dedicated QA teams worked on reproducing, debugging, and fixing these issues.  Every AMD Catalyst™ driver release undergoes exhaustive testing but we set the bar even higher with this latest driver release - all to ensure a user experience as intuitive, reliable, and enjoyable as possible.

Capture3.PNG.png

For all the aforementioned reasons, the AMD Catalyst Omega special edition driver is the biggest and the best software upgrade AMD has released this year. It’s our way of saying ‘Thank you’ and Happy Holidays.

 

Sasa Marinkovic is Head of Software Marketing for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

 

FOOTNOTES:

* Intel Core i7 4960X with 16GB DDR3-1866, AMD Radeon™ R9 290X Windows 8.1 64bit comparing launch driver 13.12 vs Driver 14.501.  Tests run at 3840x2160. BioShock Infinite @  ultra scored 30.47 vs 36.24 fps.

** AMD A10 7850K with R7 graphics, 2x4GB DDR3 2400, Windows 8.1 64bit comparing Catalyst 14.2 vs Driver 14.50. In Batman: Arkham Origins @ 1080P,  PHYSX=off GEOMETRYDETAIL=normal DYNAMICSHADOWS=normal MOTIONBLUR=off DOF=normal DISTORTION=off LENSFLARES=off LIGHTSHAFTS=off REFLECTIONS=off AO=normal we see an uplift from 34.96 fps to 45.2 fps.

*** Compared to previous driver release

NOTE: Gamers with Mantle-enabled AMD Radeon™ graphics cards or AMD APUs must have AMD Catalyst™ 14.9.2 Beta (or newer) installed in their system. The game will allow users to select Mantle at runtime. This driver is available here.

 

Friends, diplomats, would-be bureaucrats, today is a truly exciting day in the history of PC gaming: we Sid Meier’s Civilization® addicts have an all-new Civ game to play! Before you commit to one more turn and push your bed time back by five hours, please join us in exploring the day-one Mantle support in Sid Meier’s Civilization®: Beyond Earth™.1

 

A GAME THAT SCARCELY NEEDS AN INTRODUCTION

Sid Meier's Civilization: Beyond Earth is a new science-fiction-themed entry into the award-winning Civilization series. Set in the future, global events have destabilized the world leading to a collapse of modern society, a new world order and an uncertain future for humanity. As the human race struggles to recover, the re-developed nations focus their resources on deep space travel to chart a new beginning for mankind.

 

introbanner.jpg

As part of an expedition sent to find a home beyond Earth, you will write the next chapter for humanity as you lead your people into a new frontier and create a new civilization in space. Explore and colonize an alien planet, research new technologies, amass mighty armies, build incredible Wonders and shape the face of your new world. As you embark on your journey you must make critical decisions. From your choice of sponsor and the make-up of your colony, to the ultimate path you choose for your civilization, every decision opens up new possibilities.

 

AN AMD GAMING EVOLVED COLLABORATION

Firaxis Games and AMD have been in close collaboration on Sid Meier’s Civilization: Beyond Earth for many months, and indeed Firaxis has been an enthusiastic advocate and development partner for Mantle. Looking back at comments made by the studio in April, AMD Radeon™ customers definitely have cause for excitement:

 

By reducing the CPU cost of rendering, Mantle will result in higher frame rates on CPU-limited systems.  As a result, players with high-end GPUs will have a much crisper and smoother experience than they had before, because their machines will no longer be held back by the CPU.On GPU-limited systems, performance may not improve, but there will still be a considerable drop in power consumption.  This is particularly important given that many of these systems are laptops and tablets. The reduced CPU usage also means that background tasks are much less likely to interfere with the game’s performance, in all cases.


Finally, the smallness and simplicity of the Mantle driver means that it will not only be more efficient, but also more robust. Over time, we expect the bug rate for Mantle to be lower than D3D or OpenGL.  In the long run, we expect Mantle to drive the design of future graphics APIs, and by investing in it now, we are helping to create an environment which is more favorable to us and to our customers.


These benefits should come as no surprise to gamers that have been following the history of Mantle, but they’ve been put to particularly good use in Civilization. Let’s dig in!

 

MANTLE IN SID MEIER’S CIVILIZATION: BEYOND EARTH

Mantle is a high-efficiency graphics interface (an “API”) that permits supporting software to leverage the complete capabilities of an AMD Radeon™ graphics card. Mantle does this by reducing software bottlenecks and widening the parallelization of a game’s renderer.

 

Akin to allowing more cars on the road with no additional congestion, Mantle’s design endows a PC with the power to process more simultaneous information. New rendering techniques, higher framerates, more fluid gameplay and superior visual fidelity are all possible with Mantle. AMD is over a year ahead of other graphics companies in delivering this kind of technology to its customers and development partners.

 

John Kloetzli, Firaxis Games’ Principal Graphics Programmer for Civilization: Beyond Earth, put it this way:

 

“If you play [Civilization: Beyond Earth] for 40 hours, you’ve built an enormous empire. There’s a huge amount going on, besides just these tactical battles. We do allow you to zoom out quite far.  […] When you back up, you see your whole empire at once. That’s demanding. That’s when the performance, typically, in PC strategy games begins to go down. This is exactly the situation wherein we’re incredibly excited about Mantle.”

 

We also asked John if Mantle was difficult or complicated to implement:

 

“There definitely is cost involved [for supporting Mantle]. It’s definitely not an API that’s going to hold your hand and it’s not for hobbyists, really. But Mantle is not a significant overhead for a professional graphics team to add to a game. In fact, I did most of the design and programming of the graphics features in [Civilization: Beyond Earth] myself, and I also found time to do the vast majority of the programming for our Mantle backend as well. We fit it in our production schedule, it didn’t push us back any, and we’ll release [Mantle] concurrently with the DirectX® 11 version.”

 

That sounds like a winning combination for gamers and developers. Let’s see how Firaxis put Mantle to use!

 

MANTLE SPLIT-FRAME RENDERING WITH AMD CROSSFIRE™ TECHNOLOGY

UPDATE: Firaxis Games has published additional commentary on split frame-frame rendering (SFR) in Mantle. You should give it a read!

 

With a traditional graphics API, multi-GPU arrays like AMD CrossFire™ are typically utilized with a rendering method called “alternate-frame rendering” (AFR). AFR renders odd frames on the first GPU, and even frames on the second GPU. Parallelizing a game’s workload across two GPUs working in tandem has obvious performance benefits.

 

As AFR requires frames to be rendered in advance, this approach can occasionally suffer from some issues:

  • Large queue depths can reduce the responsiveness of the user’s mouse input
  • The game’s design might not accommodate a queue sufficient for good mGPU scaling
  • Predicted frames in the queue may not be useful to the current state of the user’s movement or camera

 

Thankfully, AFR is not the only approach to multi-GPU. Mantle empowers game developers with full control of a multi-GPU array and the ability to create or implement unique mGPU solutions that fit the needs of the game engine.

 

In Civilization: Beyond Earth, Firaxis designed a “split-frame rendering” (SFR) subsystem. SFR divides each frame of a scene into proportional sections, and assigns a rendering slice to each GPU in AMD CrossFire™ configuration.2 The “master” GPU quickly receives the work of each GPU and composites the final scene for the user to see on his or her monitor.

 

ESSENTIAL READING: How does split frame rendering work in Civilization: Beyond Earth?

 

As you can probably surmise, SFR requires high parallelization, efficient inter-GPU communication, and reliable delivery of slices to the master GPU. AMD Radeon™ graphics cards running Mantle are uniquely equipped to meet those requirements.

 

NOTE: Sid Meier’s Civilization®: Beyond Earth™ presently supports a maximum of two graphics cards. To try mGPU on Mantle for yourself, navigate to %homepath%\Documents\my games\Sid Meier's Civilization Beyond Earth\ in "My Computer." Open the GraphicsSettings.ini file and set "Enable MGPU=1".


MANTLE MULTI-THREADED COMMAND BUFFER SUBMISSION

As Mantle rises to meet the parallelization requirements of SFR, Mantle also supercharges Beyond Earth’s ability to utilize a gamer’s multi-core CPU.

 

In computer graphics, a “command buffer” is a type of memory buffer containing instructions (or “commands”) that the GPU will execute to carry out required rendering workloads. Feeding the GPU with a continuous, uninterrupted flow of commands is essential to keeping the whole graphics card at high utilization. High utilization can yield higher framerates and/or higher image quality, depending on the focus of the game developer.

 

CivBeyondEarth07.jpg

 

Mantle is remarkable in its ability to spread a game engine’s command buffer submissions across multiple CPU cores, ultimately allowing for a wider stream of graphics work to be processed and queued to the GPU.

 

In the case of Sid Meier’s Civilization: Beyond Earth, you’ll see later in this blog that this wide communication lane to the AMD Radeon™ GPU is used to sustain higher overall framerates when empires get large and detailed in the late game.

 

EQAA in Mantle

Aliasing, the nasty “jaggies” on the edges of 3D objects in a PC game, is the bane of gamers everywhere.  Aliasing is produced when a sharp edge is rendered to a monitor, which doesn’t offer sufficiently high pixels per square inch to properly express a smooth line.

 

There are many types of anti-aliasing designed to combat this unwanted phenomenon, and the majority of them fall into a category known as “multisample anti-aliasing” or MSAA. As the name implies, MSAA relies on “samples,” which is a graphics card’s test for whether or not a pixel on your monitor is occupied by one or more objects from the game world. If a pixel is covered by more than one triangle then the final contents/color of that pixel will be a blend of the information covering that pixel to produce a smoother edge.

 

Games and GPUs can cooperate to increase the number of samples being taken with each pixel, and these samples may test for color or coverage. Higher coverage sampling improves the accuracy of detecting whether or not an object occupies the pixel; higher color sampling improves the blending between samples confirmed to be occupied. Gamers increase the sample rate by choosing 2x, 4x or 8x MSAA, causing every pixel to be tested for color and coverage in two, four or eight locations.

 

LEARN MORE: A Quick Overview of MSAA

 

Like MSAA, AMD’s Enhanced Quality Anti-Aliasing (EQAA) also comes in 2x, 4x and 8x sampling modes, but each EQAA mode takes twice as many coverage samples as MSAA. Increased coverage testing allows the GPU to more accurately detect objects within a pixel, potentially allowing EQAA to detect and smooth a hard edge that might have been missed with fewer samples. Coverage samples are computationally cheaper than color samples, so EQAA proves to be a good compromise between quality and performance.

 

EQAA_samples.png

 

Civilization: Beyond Earth automatically enables EQAA in Mantle (and DirectX®!) on supporting AMD Radeon™ GPUs when the user chooses to enable the in-game anti-aliasing options.

 

Customers with older GPUs that lack hardware support for Mantle can still take advantage of EQAA through the AMD Catalyst™ graphics driver. Simply enable 2x, 4x or 8xMSAA in the options menu of your favorite game (if supported), and ensure you have “enhance application settings” selected in the 3D Application Settings tab of AMD Catalyst™ Control Center.

 

SINGLE-GPU PERFORMANCE

Throughout this blog you’ve learned how Mantle can be used to enable great multi-GPU responsiveness, superior CPU multi-threading and smooth anti-aliasing. But thousands of customers effectively tell us every day that single-GPU performance matters more than anything – by owning single-GPU systems!

 

Our collaboration with Firaxis Games to integrate Mantle with Civilization: Beyond Earth is a landmark technical achievement that proves we’re listening. Across every GPU comparison we tested, AMD Radeon™ graphics cards with Mantle delivered the best performance. In fact, the AMD Radeon™ R9 290X 8GB is the fastest graphics single-GPU card on the planet. If you want to play Civilization: Beyond Earth, It doesn’t get any simpler than that.3

CivBE_4k_Ultra_8xAA.png

CivBE_1440p_Ultra_8xAA.png

CivBE_1080p_High_4xAA.png

 

WRAP-UP

AMD and Firaxis Games have worked together for months, not only to equip Civilization: Beyond Earth with a Mantle-based renderer, but to refine the Mantle specification with the features that Firaxis wanted to see. Hundreds of collaborative man hours are coming together for AMD Radeon™ customers at this very moment, and the results speak for themselves: fast, beautiful, efficient performance for Sid Meier’s Civilization: Beyond Earth.

 

That is the power of the AMD Gaming Evolved Program. We hope you enjoy one more turn!

 


Sid Meier's Civilization: Beyond Earth is a technology partner in the AMD Gaming Evolved program. Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

 

  1. Mantle application support is required.
  2. AMD CrossFire™ technology requires an AMD CrossFire Ready motherboard and may require a specialized power supply and AMD CrossFire Bridge Interconnect. Check with your component or system manufacturer for specific model capabilities.
  3. In Sid Meier’s Civilization®: Beyond Earth™ internal benchmark test at 3840x2160, the AMD Radeon™ R9 290X 8GB with Mantle outperforms the GeForce GTX 980 with DirectX® 11, NVIDIA’s highest-performing single-GPU graphics card as of October 20, 2014, by 45.38 average FPS to 44.89 average FPS using the Ultra in-game preset with 8xAA. Test system: Intel Core i7-4960X, 16GB DDR3-1866, Asus SABERTOOTH X79, Windows 8.1 x64, AMD Catalyst™ 14.9.2 Beta and ForceWare 344.16 WHQL.

Alien: Isolation™ hits the streets today promising to test your fortitude for playing in the dark. While you’re busy skulking through Alien-infested corridors, no doubt hiding from those crazy telescoping jaws and a river of acid spit, have a pause to admire the world around you. That world is jam-packed with truly state-of-the-art rendering technology. Today we’ll be exploring how AMD and The Creative Assembly utilized the resources of the AMD Gaming Evolved program to develop and optimize those technologies for DirectX® 11-ready AMD Radeon™ graphics cards.


NERD WARNING: Serious tech talk ahead! PC graphics junkies are in for a treat, but we’re going into exhaustive detail. Buckle up!

 

BUT FIRST, A LITTLE ABOUT THE GAME

Discover the true meaning of fear in Alien: Isolation, a survival horror set in an atmosphere of constant dread and mortal danger. Fifteen years after the events of Alien™, Ellen Ripley's daughter, Amanda enters a desperate battle for survival, on a mission to unravel the truth behind her mother's disappearance.

As Amanda, you will navigate through an increasingly volatile world as you find yourself confronted on all sides by a panicked, desperate population and an unpredictable, ruthless Alien.

 

Underpowered and underprepared, you must scavenge resources, improvise solutions and use your wits, not just to succeed in your mission, but to simply stay alive.

 

Want to see more of Alien: Isolation™? More killer videos are right over here.


IT’S BEAUTIFUL

PC gamers are in for a treat when they dial up the settings of Alien: Isolation. Alien: Isolation’s engine is all-new, written from the ground up to provide all of the advanced effects discussed in this blog. PC gamers will be delighted to learn that both console and PC performance envelopes were specifically targeted to provide a unique, highly-optimized experience on any system Alien: Isolation can be played.

 

ILLUMINATING THE SEVASTOPOL

To achieve the dramatic lighting effects on the Sevastopol, a setting in Alien: Isolation, a “deferred renderer” lies at the heart of its engine. This kind of renderer renders the entire scene visible to the player in a single pass, then stores all properties (e.g. positions and materials) required for beautiful lighting in a “G-Buffer.” The stored properties that matter to scene lighting can now be deferred until after the scene geometry is rendered, which makes the processing effort of lighting proportional only to the lighting complexity rather than lighting and geometry complexity. In short, the deferred renderer allows artists to place hundreds of dynamic lights in the scene and achieve great geometric detail simultaneously.

 

ai_gbuffer.png
TOP LEFT: Albedo, TOP RIGHT: Normal mapping, LOWER LEFT: Shininess, LOWER RIGHT: Fully-lit scene!

 

But the benefits of a deferred renderer are matched by some drawbacks. Foremost: limited support for diverse material types (e.g. metal, cloth, wood, skin, hair, etc.) and proper illumination of semi-transparent objects.

 

Classically, diverse material types must be rendered as a separate pass after the deferred lighting—a performance penalty. Alternatively, diverse materials can be treated with a grossly simplified physical model that doesn’t effectively simulate the true properties of those materials. Can you avoid sacrificing performance and/or quality if you want good lighting and realistic materials? Alien: Isolation proves that you can.

 

Alien: Isolation circumvents the materials issue through novel use of the GPU’s stencil buffer to tag the objects that use a unique material in the scene. The lighting/material interaction for each unique material type is rendered using a classic multi-pass technique, with the unique exception that the engine also tests the visibility of each material to the player’s field of view. Unseen materials are rejected in the graphics pipeline objects to avoid paying the rendering penalty typically associated with the multi-pass lighting we mentioned above.

 

And where semi-transparent objects are usually difficult for a deferred renderer, Alien: Isolation works around this as well. Only solid/opaque geometry can be rendered into the engine’s G-buffer, which means the semi-transparent geometry is normally rendered after the scene is composed using a reduced number of lights to conserve performance. The Creative Assembly’s solution is to dynamically generate a light map for each semi-transparent object. The light map is populated on-the-fly with the lighting data from the G-buffer, meaning translucent objects receive correct lighting regardless of scene complexity.

 

More technical details behind The Creative Assembly’s brilliant lighting model can be found in this presentation.

 

REAL-TIME RADIOSITY IN DIRECTCOMPUTE

Lighting an in-game world with direct sources like lamps and sunlight is not enough to achieve believable or realistic lighting. Here in the real world, rays of light bounce off of all kinds of reflective surfaces and scatter light into the surrounding area; those light rays continue to bounce around the room until all the energy from the rays has been absorbed. That bouncing and reflectivity is called “radiosity.”

 

Radiosity is an insanely difficult problem to solve in real-time graphics, and most games only fake it by using some form of full-scene ambient lighting. “Approximation” was not good enough for The Creative Assembly, who developed a full real-time radiosity engine for Alien: Isolation.

 

At the highest level, Alien: Isolation’s engine is constantly updating the radiosity model for the entire scene. This is achieved by placing a set of invisible “light probes” throughout the scene. Using Microsoft’s DirectCompute, these probes process how much light they are receiving from the lighting coming out of the deferred renderer. Lighting contributions from emissive surfaces, like computer screens and LED signs, are added to the data processed by the probe and combined with indirect (reflected) lighting coming from the previously-rendered frame. To light fixed or static objects in the scene visible to the player, the light probe data is crunched into lightmaps, applied to the geometry and rendered out.

 

ai_radiosity2.png

LEFT: The radiosity lightmaps, RIGHT: The world lit only with lightmap data. Notice how precise the real-time lighting is.

 

For the dynamic objects in the world, such as characters and particle effects, the light probes are used to generate radiosity cubemaps via DirectCompute.

 

Finally, the use of DirectCompute for AMD Radeon™ graphics customers is especially important, as the award-winning Graphics Core Next (GCN) architecture was

specifically designed with such “general purpose” languages in mind. Though that general purpose-ness was originally intended to be used in non-gaming scenarios, modern game engines have made great use of DirectCompute to quickly crunch highly-parallelized data. Awesome!

 

ai_radiosity.png
LEFT: Full engine render with radiosity disabled, RIGHT: Render with radiosity enabled. Notice the more subtle lighting throughout the scene, which fully accommodates reflections from metallic surfaces.

 

HIGH DEFINITION AMBIENT OCCLUSION+ (HDAO+)

To complement Alien: Isolation’s dynamic lighting and real-time radiosity, the renderer also uses HDAO+ (an AMD-developed technique) to calculate the shadows that are created when lighting reaches cracks and crevasses throughout the scene. HDAO+ uses DirectCompute (good for AMD Radeon™ graphics!) to calculate the size and strength of these shadows. HDAO+ uses the information coming out of the G-buffer and computes at multiple resolutions to help achieve the best balance of quality and performance.

 

ai_hdao.png
TOP LEFT: HDAO+ disabled, TOP RIGHT: HDAO+ enabled, LOWER LEFT: All the shadows that would never get rendered without HDAO+.

 

BETTER TEXTURES IN THE YEAR 2137

Texture compression is essential for good performance in content-heavy games. With texture compression, developers can cram more textures into a scene without overloading the GPU’s framebuffers or exhausting memory bandwidth while loading those textures into VRAM.

 

The industry has long relied on “DXT” compression which compresses each 4x4 block of pixels from the original image into a data set that’s one quarter to one eighth the size. These textures can be decompressed on the fly with dedicated capabilities in AMD Radeon™ graphics hardware.

 

The problem with compressing textures is that artifacts are introduced due to the compression scheme. You’ve seen pixilated and blocky JPEG files, and the DXT artifacts are not dissimilar. The Abs Error column below isolates these errors, with more color indicating a higher artifact quantity.

 

DirectX® 11 introduced a better, more complex, compression scheme called “BC7” that still compresses to a quarter of the size of the original image but significantly reduces the artifacts normally associated with the older DXT methods like BC3. AMD Radeon™ graphics hardware is ready for DirectX® 11.2, meaning those gamers will have access to the BC7-compressed texture pack for superior texture fidelity.

 

DXTC.png
The high artifact depicted in the BC3 abs error column would be seen as fuzzy or blocky textures by the player. The low abs error rate on BC7 texture compression preserves performance and quality for AMD Radeon™ graphics users.

 

LURKING IN THE SHADOWS

Realistic shadowing is an essential ingredient of Alien: Isolation’s creepy atmosphere. To make these shadows as realistic as possible, The Creative Assembly team tapped AMD’s “contact hardening shadow” technology. This technique dynamically hardens or soften a shadow’s edges depending on the distance of the shadow from the light source and object casting that shadow.

 

While shadowing techniques are incredibly efficient on the Graphics Core Next (GCN) architecture in contemporary AMD Radeon™ graphics products, this technique nevertheless requires a powerful GPU and can only be enabled when the “ultra” in-game graphics preset is enabled.

 

ai_chs.png
LEFT: Contact Hardening Shadows disabled, RIGHT: Contact Hardening Shadows enabled. Notice that the shadows are softer and more realistically diffuse with this effect enabled.

 

GPU-ACCELERATED PARTICLES

The particle effects in Alien: Isolation breathe life into the eerie setting of the Sevastopol. From fire and smoke effects, to the streams of sparks generated by Ripley’s blow torch, an efficient way to simulate the thousands of simultaneous particles is to run a physical simulation on an AMD Radeon™ GPU.

 

The different characteristics of these particle types are artist-controlled using parameters baked into the metadata of a texture. Particles can be affected by velocity fields and bounced off the scene geometry by reading data out of the G-buffer. When it's time to render for the player, the particle physics are GPU-accelerated with DirectCompute on AMD Radeon™ graphics cards!

 

ai_particles.jpg

Affected by thermoclines and world geometry, embers soar into the sky backed by a real physics simulation calculated on an AMD Radeon™ graphics card.

 

SMOOTHIN’ THOSE SURFACES

Throughout Alien: Isolation, the Graphics Core Next architecture’s prowess with geometry tessellation is put to excellent use with silhouette-enhancing tessellation. This kind of tessellation smartly adds detail to a scene by dynamically increasing geometric complexity only on the edges of objects visible to the player. This calculated exercise of tessellation improves details on pipes, padding and alien hives without wasting GPU cycles on invisible work.

 

ai_tessellation.png

TOP LEFT: Tessellation disabled, TOP RIGHT: Tessellation enabled, LOWER LEFT: Tessellation disabled (wireframe), LOWER RIGHT: Tessellation enabled (wireframe). Notice the increased geometric complexity and detail.


PERFORMANCE

Now that you’ve seen how AMD and The Creative Assembly collaborated to implement a host of AMD Radeon™ graphics-optimized effects in this stellar new game engine, let’s see how it performs! We’ll let the charts speak for themselves—AMD dominates!

 

ai_perf2.png

ai_perf1.png

ai_perf3.png

 

WRAP-UP

When you’re done messing your knickers and fleeing from Aliens, stop to appreciate what’s around you:

  1. Unique PC effects for everyone to enjoy;
  2. and AMD Radeon™ graphics-optimized performance for AMD customers.

Those are our top missions in the AMD Gaming Evolved program, and we’re proud to support developers, like The Creative Assembly, who are equally passionate about PC gaming.

 

Speaking of AMD Gaming Evolved, you may have heard of our new Never Settle: Space Edition promotion. Never Settle: Space Edition leverages the AMD Gaming Evolved partnerships we have with developers like The Creative Assembly to give you complimentary codes for games, like Alien: Isolation, with the purchase of an eligible AMD Radeon™ R9 Series GPU from a participating retailer.

 

Get AMD Radeon™ graphics and get your game on!

 


Alien: Isolation is a technology partner in the AMD Gaming Evolved program. Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

NOTE: Sniper Elite III is available on Steam or in the Never Settle: Space Edition promotion with support for the Mantle graphics API starting today!This blog was authored by Kevin Floyer-Lea, Head of Programming, Rebellion Developments. It has been reprinted with permission.


Over the last few months at Rebellion we've taken our in-house Asura engine used in Sniper Elite 3 and added support for AMD's Mantle API. Our Head of Programming Kevin Floyer-Lea brings us up-to-date with the story so far...

 

WHY MANTLE?

 

The primary goal of Mantle is to provide a low-level interface that allows applications to speak directly to AMD's "Graphics Core Next" family of GPUs - greatly reducing the CPU overhead of translating commands for the GPU. With more traditional APIs like DirectX 11 there is often a disconnect between how costly a developer thinks (hopes!) an API call will be, and how much work the driver actually ends up doing underneath.

 

In simple terms the expected CPU gains of Mantle should be twofold. Firstly, making a command stream for the GPU should be less work on the CPU - and without any "surprises" or mysterious stalls. Secondly, the making of command streams can be entirely multithreaded. The native support of multithreading is perhaps one of the most important features from Rebellion’s point of view - while Microsoft had made some attempts at supporting multithreading with DX11 it was fundamentally limited by the single-threaded design choices of the previous versions.

 

Furthermore, with Mantle the developer gains access to things that drivers typically hide away - like the GPU's dedicated memory. This brings the PC closer to console programming, where developers are used to having direct control over available resources and squeezing the most out of the hardware.

 

It was these aspects which drew us to supporting Mantle - we'd long wished for the sort of control we had on console on our PC titles, and it was clear that whatever else may happen with Mantle in the future, it's most definitely kick-started a move to more lightweight APIs as we've seen with recent announcements concerning Microsoft’s DirectX 12, Apple's Metal, and Khronos’ Next Generation OpenGL Initiative.

 

OUR AIMS

 

Our main goal for supporting Mantle was to take maximum advantage of the potential for multithreading the API calls, and refactor our existing engine rendering pipeline to better fit what we predict are the requirements of this new breed of lightweight APIs. In that respect we spent more time restructuring our engine's rendering architecture than we did writing Mantle-specific code!

 

It was also important that we reused exactly the same data and assets as the (already shipped!) DX11 version of Sniper Elite 3 - so we wouldn't be optimising any shaders, data formats or rendering techniques at this stage - we'd just be shipping a new executable and reusing the same assets. This was primarily done to reduce cost and risk - but in hindsight it makes us a fairly unbiased test case between the two APIs.

 

What we have now is a fairly preliminary implementation in many respects - as Asura is a fully cross-platform engine designed to work on multiple platforms simultaneously, we aim to build upon this work to make a more independent code layer which sits over multiple low-level APIs as they become available.

 

EARLY RESULTS

 

For our first comparison let’s look at the beginning of the “Siwa” level of Sniper Elite 3, which is one of the more graphically demanding start positions in the game as it encompasses lots of layered scenery and vegetation stretching off to the old city complex in the distance. Half-hidden in the scene are dozens of people and some vehicles which the culling system can’t remove because they are actually visible – just not that obvious. Gameplay hasn’t really kicked off yet so the rest of the engine’s systems are idling along; rendering is the biggest CPU hit here.

 

se3.jpg

Below is what Task Manager reports if we just sit at the start position for 60 seconds. This is using an Intel i7-3770K CPU with 8 logical processors, coupled with an AMD R9 290X GPU, running on Ultra settings at a resolution of 1920x1200 – so we’d expect to be GPU bound in this scenario.

 

se3cpudata.jpg

 

The Mantle version clearly shows a much more balanced CPU load across the cores – though the total CPU utilisation has only dropped from 23% on DirectX 11, to 21% on Mantle. The more balanced load is exactly as we’d hoped, since all the Mantle API calls are now distributed across the available cores by our Asura engine’s multithreaded task system, just like we do for other systems like AI, animation or physics.

 

It’s worth noting that Sniper Elite 3 and the Asura engine are already optimised to account for DirectX 11’s weaknesses. For example, we make heavy use of instancing and similar batching techniques to reduce the number of draw calls we make per frame – all the usual things to reduce CPU overhead, which means Mantle will have less easy wins compared to other draw-call heavy titles.

 

So that’s what the CPU is doing – but what’s the actual framerate? On those settings we’re running at an average of 88fps on DX11, and 100fps on Mantle – around a 14% speed increase. This explains why the total CPU utilisation is still quite similar – with Mantle the CPU has to cope with 12 more frames every second, meaning we’re packing in more work and still using less CPU power. Furthermore because the work is more distributed, if we increase CPU load (say by using a faster graphics card, or by lowering resolution) we’re less likely for a single logical processor to become the bottleneck.

 

The size of the frame-rate increase is a pleasant surprise, as frankly at this stage in development we were expecting to have a more roughly equal frame-rate when GPU bound. There’s still a fair amount of scope for increasing performance with Mantle, particularly as we’re not yet taking advantage of the Asynchronous Compute queue. This would allow us to take some of our expensive compute shaders – like our Obscurance Fields technique – and schedule them to run in parallel with the rendering of shadow maps, which are particularly light on ALU work.

 

One reason for the performance gains seen so far may be the way we are handling the GPU’s memory - we pre-allocate VRAM in large chunks and then directly manage and defragment that memory ourselves. Similarly when updates for dynamic data and streaming textures are needed, we DMA copy the affected memory as part of our command stream to the GPU - thus eliminating the sort of copying and duplicating of buffers the DirectX drivers might have to do.

 

Ironically, one unintended consequence of increased texture streaming performance, and the ability  to hold more textures at once given we have more control over memory, is that we’ve found that we often have far more high resolution textures being used in the Mantle version... which could in theory increase rendering time. Thankfully speed increases from other areas seem to have hidden this, so you’ll just get better looking textures!

 

Another big reason for the speed gains is the way Mantle handles shaders. On DirectX we’re accustomed to having separate shader stages that are treated independently – the common ones being vertex and pixel shaders. Mantle instead uses monolithic pipelines – a concept that combines all the shader stages and the relevant rendering state into a single object.

 

As well as taking less CPU overhead to use, having everything together in one pipeline allows for some holistic optimisations that otherwise wouldn’t be possible – for example, perhaps that value calculated in the vertex shader isn’t actually used in the pixel shader... so it could be optimised out entirely. This seems to have particularly benefitted Sniper Elite 3 when it comes to tessellation, where we’re making heavy use of all the traditional stages as well as hull and domain shaders.

 

BENCHMARKS

 

To make testing easier we’ve added a Benchmark option to Sniper Elite 3 – available on the “Extras” page from the game’s front end menus. The benchmark contains varying scenes similar to what happens in game, e.g. wide, long distance views; close-ups with tessellation; obscurance fields and shadows; a truck full of characters driving by; lots of special effects overdraw in a gratuitous slow-mo explosion. These put different degrees of stress on the CPU and GPU and hopefully give us a more representative view of what happens in the game as a whole.

 

A word of caution at this point - when leaving the benchmark running repeatedly, we found that the dynamic power management software can kick in, reducing GPU cycle speed and thus skewing the profiling results. So it’s a good idea to use something like AMD’s OverDrive panel to monitor your GPU and guarantee consistency – and possibly increase your allowed fan speed if you don’t mind trading noise for frame-rate!

 

At the end of the benchmark you’ll get an average frame-rate report, and a more detailed log file is saved out to your Documents folder. Our initial tests with the benchmark are showing very similar performance gains as seen in the Siwa test above; here’s breakdown using our R290X setup, varying both resolution and quality settings.


To guarantee we’re GPU bound for the final setting we’ll use 1920x1200 at Ultra quality with 4x supersampling – which means the engine internally renders everything at 3840x2400, and then right at the end downsamples back to 1920x1200 to give us an extremely good looking (and expensive) anti-aliased image.

se3perf.png

Similarly here are the results for a HD7970, coupled with an older CPU that has only 4 logical processors:

 

se3perf2.png

Rather than going into more detail here we’ll let tech sites and interested users have a go themselves and come to their own conclusions. Let us know what you find!       

 

TRY IT YOURSELF

 

The latest version of Sniper Elite 3 now available on Steam has support for both Mantle and the Benchmark feature. To enable the Mantle build you need to select the “Use Mantle” tickbox in the game’s launcher, which is accessed via the Options button. The tickbox should be greyed out if you don’t have the requisite hardware or up to date drivers – we require AMD Catalyst™ 14.9 or later drivers which are available here:  http://support.amd.com/en-us/download


NOTE: be aware that these drivers only support Windows 7, Windows 8.1 and Windows 10 – not Windows 8.0! If you have Windows 8.0 you can update to 8.1 for free via the Windows Store page. Best to back stuff up first!


CONCLUSIONS

 

All in all, even this first pass of Mantle has delivered all that we’d hoped for:

 

  • Improved frame-rate
  • Reduced CPU power consumption (important for laptops)
  • Less susceptible to frame-rate spikes when other programs hit the CPU
  • Future scalability with higher numbers of cores
  • Scope for increasing scene and world complexity
  • Ability to increase the CPU budget for other systems like AI

 

The last two points are more relevant to our future games, and for now we need to see how this first pass of Mantle behaves in the wild and fix any issues that come up, before moving onto new features and improvements that would make sense to add to Sniper Elite 3. One big area that we haven't yet addressed which needs investigating is multiple GPU support - this can be a tricky area to get right.

 

The way DirectX11 handles multiple GPUs is “AFR” or Alternate Frame Rendering, which as the name suggests means if you have two comparably powered GPUs they simply take turns rendering frames. This is in many respects the easiest approach to take – and is a great way of making your game CPU bound! So possibly our Mantle version could show some big improvements when using this method.

 

However, with the independent control over the GPUs Mantle gives us, we could approach the problem very differently - for example one GPU could be rendering the basic geometry in the scene, while another handles lighting and shadows for the same frame, with the final image composited at the end. This may also provide a route for when GPUs aren’t of a comparable power level – for example an integrated APU motherboard coupled with a desktop GPU. It’s the potential for completely new approaches like this which excites me the most about Mantle and the APIs which will follow it.

 

Kevin Floyer-Lea is Head of Programming at Rebellion Developments. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

WANTED: Experienced RPGers, tabletoppers and fantasy gamers that have grown weary of Wizards and Sorcs hamstrung by silly things like cooldowns, spells per day and expensive reagents. We know you pine for raining Meteor Swarms upon unsuspecting foes like an April Shower. If that sounds appealing to you, then perhaps Lichdom: Battlemage is the game for you.

 

Let me sweeten the deal a little further: Lichdom is optimized for AMD Radeon™ customers with the world’s first implementation of TressFX Hair v2.0, AMD TrueAudio technology, AMD Eyefinity technology, and validation for 4K gaming.1,2 With the power of unlimited magic, you get sweet gameplay. And with the power of AMD Gaming Evolved, you get sweet technology. Let’s dig in!

 

YOU ARE A BADASS

Welcome to the first game where the Mage is an unmitigated badass! With no mana pools or cool-downs, Lichdom: Battlemage throws out all of the classic tropes of playing a Mage. No longer is the character marginalized so that other classes can adventure through the same levels, and finally the true jaw-dropping potential of magic has been realized.

 

Lichdom: Battlemage is a first-person caster that focuses entirely on the Mage. With limitless magical power at your disposal and brutal enemies around every corner, victory hinges on a combination of skill and strategy. You must carefully craft a vast array of spells and learn to cast them in the heat of combat.

The Lichdom: Battlemage spell crafting system offers an enormous range of customization. Every Mage is the product of crafted magic that reflects the individual's play style. Whether you prefer to target your foes from a safe distance, wade into combat and unleash your power at point-blank range, or pit your enemies against each other, endless spell customization lets you become the Mage you want to be.

 

TRESSFX HAIR V2.0

For the complete story on TressFX Hair v2.0, you should read our recent blog that comprehensively explores the technology’s latest developments. However, below I’ve compiled an executive summary of the changes:

  • New functionality to support for grass and fur
  • Continuous levels of details (LODs) are designed to improve performance by dynamically adjusting visual detail as TressFX-enabled objects move towards and away from the player’s POV
  • Improved efficiency with many light sources and shaders via deferred rendering
  • Superior self-shadowing for better depth and texture in the hair
  • Even more robust scalability across GPUs of varying performance envelopes (v.s. TressFX 1.0)
  • Modular code and porting documentation
  • Stretchiness now respects the laws of physics
  • and numerous bug fixes!

 

In general, TressFX 2.0 is a much more detailed, efficient and laws-of-physics-abiding technology than ever before. Awesome!

 

tressfxgryphon.jpg

And it’s double awesome that Lichdom: Battlemage is the very first game to make use of TressFX Hair v2.0. The effect is most prominently utilized on The Gryphon, a companion character to the protagonist. Both the male and female Gryphons feature TressFX, however the effect on the female companion is more pronounced by virtue of her haircut. You encounter The Gryphon early in Lichdom: Battlemage, and re-encounter her or him often throughout the game.

 

AMD TRUEAUDIO TECHNOLOGY

AMD TrueAudio technology is a hardware-level feature found on the AMD Radeon™ R9 295X2, R9 290X, R9 290, R9 285, R7 260X and R7 260 graphics cards. A small block of audio processing hardware is integrated directly into the graphics chips in these products. That audio processing hardware is called a “Digital Signal Processor,” or DSP.

 

blocks.png

A high-level diagram of the Tensilica Xtensa HiFi EP DSP cores and the associated hardware that comprises AMD TrueAudio technology
in an AMD Radeon™ graphics chip.

 

A DSP is specialized silicon dedicated to the task of processing digital signals. Example applications for a DSP include: audio compression, audio filtering, speech processing and recognition, simulating audio environments, creating 3D sound fields and more.

 

DSPs are fully programmable, which allows developers to creatively harness the hardware in ways limited only by their imagination and skill. We are striving with AMD TrueAudio to give game developers a blank canvas for new and never-before-heard audio environments and techniques. We hope that, with time, game developers will do with programmable audio what programmable graphics pipelines did for PC graphics.

 

Best of all, the hardware-accelerated effects of AMD TrueAudio technology are experienced with any old stereo headphones. Your headset or earbuds will do just fine!

 

chain.png

AMD TrueAudio effects are processed and applied as in-game audio is being generated. This allows the user to experience
AMD TrueAudio with plain stereo headphones and any existing sound chip.


AMD TRUEAUDIO IN LICHDOM: BATTLEMAGE

With respect to Lichdom: Battlemage, AMD TrueAudio is utilized to calculate an effect called “convolution reverb.” Convolution reverb is a technique that mathematically simulates the echoes (i.e. reverberation) of a real-life location. This effect is accomplished by recording an “impulse response,” which is a snapshot of the echo characteristics of a real-world location. That impulse response is fed back into software that can recreate that behavior in a PC game.

 

Lichdom: Battlemage uses this technique to make buildings, cathedrals, alleyways and other in-game venues sound quite like they would in real life! In an environment where there are adjacent areas with different echo characteristics and impulse response (example: a cathedral adjacent to a cave and an open space), multiple convolution reverbs must be processed in parallel to create the most realistic sound environment. This effect is automatically enabled when an AMD TrueAudio-capable GPU is configured on the system.

 

You can experience the convolution reverbs for yourself most prominently in the level immediately following the opening tutorial mission. The soaring caverns and claustrophobic tunnels of the second stage make for an exciting and complex acoustic environment.

 

BIG SCREENS

Lichdom: Battlemage has achieved “validated” status for Eyefinity technology 3x1 configurations. This means that the user will enjoy the proper field of view, all menus and HUD elements will be placed correctly, cutscenes will be played without unexpected cropping or stretching, and more.  This is the highest level of compatibility we can award to any game. Additionally, this validation definitely makes Lichdom: Battlemage ready for 4K60 MST and 4K60 SST UltraHD displays!

 

PERFORMANCE

As an AMD Gaming Evolved title, performance on Lichdom: Battlemage solidly offers an advantage to AMD Radeon™ graphics cards. See the benchmarks below for results and recommended graphics settings for your GPU.3

 

Additionally, while the AMD Radeon™ R7 260X and R7 260 are unlisted in our charts, these users should run the game at 1080p with medium quality settings. TressFX Hair v2.0 and anti-aliasing should be disabled. Performance on these graphics cards can be in the mid- to high-30s with these settings.3lichdom_perf1.png

lichdom_perf2.png

lichdom_perf3.png

WRAP-UP

If you have a hankering to be the Mage you always wished you could be, then pick up a copy of Lichdom: Battlemage from Steam today. And as you blast your way through cities and ruins overrun with the occult, take a moment to admire the scenery and the technology—you won’t be disappointed.

 


Robert Hallock does Technical Communications for Gaming & Desktop Graphics at AMD.

His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only.  Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of

AMD or any of its products is implied.


FOOTNOTES:

  1. AMD TrueAudio technology is offered by select AMD Radeon™ R9 and R7 200 Series GPUs and is designed to improve acoustic realism. Requires enabled game or application. Not all audio equipment supports all audio effects; additional audio equipment may be required for some audio effects. Not all products feature all technologies — check with your component or system manufacturer for specific capabilities.
  2. AMD Eyefinity technology supports multiple monitors on an enabled graphics card. Supported display quantity, type and resolution vary by model and board design; confirm specifications with manufacturer before purchase. To enable more than three displays, or multiple displays from a single output, additional hardware such as DisplayPort™-ready monitors or DisplayPort 1.2 MST-enabled hubs may be required. A maximum of two active adapters is recommended for consumer systems. See www.amd.com/eyefinityfaq for full details.
  3. All resolutions and quality levels described by the performance diagrams were tested by AMD performance labs on the following platform: Intel Core i7-4970X, Asus X79 Sabertooth, 16GB DDR3-1866, Windows 8.1 x64. AMD Catalyst™ revision: 14.7 RC3. NVIDIA driver revision: 340.52 WHQL.

Since introducing TressFX Hair in the smash hit Tomb Raider™ last year, we’ve been diligently working to optimize the technology, enable compatibility with more platforms, and add new features. Today we wanted to take a little bit of your time to tell you about what’s new with TressFX Hair, and where the technology will be going in the near term.

 

Before we dive in, however, a quick primer on the history of TressFX Hair feels warranted to set the stage. TressFX Hair was the world’s first real-time hair physics simulation in a playable game. TressFX brought an end to the era of short hair, fixed hairstyles, helmets and other unseemly workarounds structured to disguise the limited nature of hair technology.

 

In fact, TressFX Hair represented the first occasion that a hair physics technology had ever made an appearance on the PC outside of limited technical demos. AMD and Crystal Dynamics collaborated extensively to develop and optimize the technology for PC gamers, and to give Lara Croft the unabashedly contemporary look she deserved for a new chapter in her story.

 

LEVERAGING AMD RADEON™ GRAPHICS ACROSS PLATFORMS

Over the past year, we at AMD have remarked on more than one occasion that bringing AMD Radeon™ Graphics and AMD APU technologies to life on multiple gaming platforms would pay dividends for gamers. That prediction came true with the remastered Tomb Raider: Definitive Edition for Xbox One™ and PS4™. In this revisiting, TressFX Hair made its debut outside of the PC space for the very first time.

tressfx.jpg

“Getting TressFX Hair running on PlayStation® 4 and Xbox One™ benefited from the fact that AMD’s Graphics Core Next (GCN) architecture powers the graphics of these platforms,” said Gary Snethen, Chief Technology Officer of Crystal Dynamics. “We were already familiar with GCN from our collaboration with AMD on Tomb Raider, and that experience was instrumental when it was time to bring TressFX Hair to life on consoles with Tomb Raider: Definitive Edition.”

 

Citing the Graphics Core Next architecture2 as the motivation to broaden the audience for TressFX Hair is an important occasion, as it validated in practice the idea that a common architecture makes it easier to share code across the platforms targeted by a development studio. From another perspective, it shows gamers that this cross-platform simplicity enables new headroom to explore in-game effects—effects that may have gone unused in past generations due to insufficient ROI.

 

TALE OF TWO HAIRCUTS
TressFX Hair was certainly impressive from a visual perspective, but less discussed is the operational efficiency that compels appreciation on both technical and philosophical grounds. To put a fine point on that, we wanted to illustrate the actual performance impact of AMD’s TressFX Hair contrasted against NVIDIA’s Hairworks.

 

In the below diagram, we isolated the specific routine that renders these competing hair technologies and plotted the results. The bars indicate the portion of time required, in milliseconds, to render the hair from start to finish within one frame of a user’s total framerate. In this scenario, a lower bar is better as that demonstrates quicker time to completion through more efficient code.

tfx_tr_perf.png

In the diagram, you can see that TressFX Hair exhibits an identically low performance impact on both AMD and NVIDIA hardware at just five milliseconds. Our belief in “doing the work for everyone” with open and modifiable code allowed Tomb Raider’s developer to achieve an efficient implementation regardless of the gamer’s hardware.

 

In contrast, NVIDIA’s Hairworks technology is seven times slower on AMD hardware with no obvious route to achieve cross-vendor optimizations as enabled by open access to TressFX source. As the code for Hairworks cannot be downloaded, analyzed or modified, developers and enthusiasts alike must suffer through unacceptably poor performance on a significant chunk of the industry’s graphics hardware.

 

With TressFX Hair, the value of openly-shared game code is clear.

 

WHAT’S NEXT FOR TRESSFX HAIR?

As Crystal Dynamics worked to bring TressFX to other platforms, we have been busy developing an even newer version of our award-winning hair tech. In November we announced “TressFX 2.0,” an update to the effect that brings several notable changes:

  • New functionality to support for grass and fur
  • Continuous levels of details (LODs) are designed to improve performance by dynamically adjusting visual detail as TressFX-enabled objects move towards and away from the player’s POV
  • Improved efficiency with many light sources and shaders via deferred rendering
  • Superior self-shadowing for better depth and texture in the hair
  • Even more robust scalability across GPUs of varying performance envelopes (vs. TressFX 1.0)
  • Modular code and porting documentation
  • Stretchiness now respects the laws of physics
  • and numerous bug fixes!

 

Starting with grass and fur, implementing realistic physics for these objects is rather similar to hair: treat each strand as a chain, group chains together, and then apply an external force. There is obviously some voodoo at work to make grass and fur behave more like grass and fur, and rather less like long hair, but the principles are so similar that they’re a logical extension to TressFX’s capabilities.

 

In designing TressFX 2.0, we addressed a notable issue in our hair physics simulation: stretchiness. Extreme linear and angular acceleration of a fast-moving or fast-turning character could cause the hair sim to appear unnaturally stretchy. In very rare instances, the physics model could even prevent the hair from ever recovering its original length.

 

While AMD and Crystal Dynamics were largely able to overcome this problem by performing rolling iterations of a “length constraint” system in Tomb Raider, we wanted to fix it permanently and more efficiently. TressFX 2.0 addresses this issue head-on through R&D and the creation of a new General Constraint Formulation, which is designed to be considerably more accurate than the old model at dealing with the forces of acceleration on a head of hair’s global and local (per-strand) level.

Additionally, we overhauled the math behind the aforementioned “chain” structure of hair, grass and fur. We now use the Thomas Algorithm to evaluate the behavior of these objects, and this is notable because the Thomas Algorithm is very efficient and lightweight with respect to GPU number crunching. The end result for you: hair that behaves more realistically.

tressfx_math.png

Behind-the-scenes R&D work for TressFX 2.0; simplifying the TressFX Hair algorithm.

 

Next, we wanted to illustrate the impact self-shadowing (right) has on the texture and depth found in a head of hair

shadowing2.jpg

Finally, we’ll take a look at TressFX 2.0’s LOD levels. As indicated earlier in this blog, a LOD level brings scalable detail to a system of 3D objects. As LOD-enabled objects move away from you, detail is reduced by a system that sustains the apparent quality for the player—you shouldn’t notice a thing if we do our job right! Inversely, when an object moves closer to you, the detail levels are slowly dialed up to maximum in a manner that, again, should be largely imperceptible to the player.

 

The primary benefit of a LOD system is an improvement in overall system performance. With LOD levels, the GPU needn’t render a full-detail head of hair when those details are beyond the visual acuity of the player’s position in the game world.

lod_levels.png

THE FIRST TRESSFX 2.0 GAME

Beyond improvements to the effect, many gamers have asked about the next game to use TressFX Hair, and I’m pleased to say it’s Lichdom: Battlemage! The team at Xaviant is making healthy use of TressFX Hair 2.0, and had this to say about their decision to adopt TressFX:

 

“TressFX Hair is the most impressive advancement in visual fidelity in the past 24 months,” said Michael McCain, CEO and Founder, Xaviant. “TressFX proved that significant leaps in realism are still possible, even in an age where many have expressed skepticism about the very possibility of such a leap occurring. The beauty, simplicity and performance of TressFX—especially compared to its alternatives—made it an obvious choice to augment the commitment to image quality we have for Lichdom.”

 

Lichdom: Battlemage prominently uses TressFX Hair to render the female version of The Gryphon, a companion/aid to the player when a male protagonist is chosen. Her short bob haircut moves and shines just as you would expect real hair to do.

 

 

A MULTI-PLATFORM WORLD

TressFX Hair took the PC gaming world by storm, chiefly because it demonstrated that 3D graphics needn’t be incremental improvements—big and unexpected leaps can still happen! We were (and still are) very proud of that fact.

 

TressFX Hair also demonstrated the power of being transparent with your code when working with game developers. By collaborating so closely with Crystal Dynamics on TressFX Hair, we were able to make the technology efficient for all hardware, quickly incorporate the lessons and feedback from Tomb Raider™ into the 2.0 version TressFX, and make those improvements publicly available in source code form for adoption in games like Lichdom: Battlemage!

 

Finally and excitingly for gamers everywhere, Crystal Dynamics’ decision to adopt TressFX Hair for Tomb Raider: Definitive Edition shows that cross-pollination between PCs and consoles is not only possible, but happening right now and improving the overall experience on all platforms.

 

SUPPORTING RESOURCES

 


Robert Hallock does Technical Communications for Gaming & Desktop Graphics at AMD.  His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.


FOOTNOTES:

  1. All performance evaluation conducted on the following platform: Intel Core-i7 4960X, ASUS X79 Sabertooth, 16GB DDR3-1866, Windows 8.1 x64. AMD Driver: 14.2 Beta 1.3. NVIDIA Driver: 334.89. Settings: 1920x1080, maximum in-game quality preset.
  2. >Select AMD Radeon graphic cards are based on the GCN Architecture and include its associated features (AMD PowerTune technology, AMD ZeroCore Power technology, PCI Express 3.0, etc.). Not all features are supported by all products—check with your system manufacturer for specific model capabilities.

3399_RADEON30_EMAILBNR_FNL.png

 

AMD and its retail partners would like to thank you for tuning into the #AMD30Live broadcast to celebrate 30 years of graphics and gaming with us! To commemorate the occasion, many of those fine retailers have assembled some killer deals for gamers all over the world. Check the list below and jump on the ones you gotta have!

 

RETAILERTHE DEALNEED IT?
NeweggGet a free SSD when you buy an AMD Radeon™ R9 295X2Get it!
Overclockers.co.ukGet a limited edition metal box Sapphire Radeon™ R9 295X2 and a FREE Superflower 1200W power supplyGet it!
TigerDirectSave up to $250 when you buy a new PSU with an AMD Radeon™ R9 295X2Get it!
NCIXSave up to $200 when you buy a new PSU with an AMD Radeon™ R9 295X2Get it!
NeweggSave up to $225 when you buy a new PSU with an AMD Radeon™ R9 295X2Get it!
CyberPowerPCBuying a rig? Get a free upgrade from an AMD Radeon™ R9 270X 2GB to XFX Radeon™ R9 280 3GBGet it!
iBUYPOWERPCs starting from $589 with a free upgrade to an AMD FX-6300 CPUGet it!
TigerDirectSave big on CPU+GPU bundles featuring AMD Radeon R9 290, R9 270 or R7 250 graphicsGet it!
LDLCGreat deals on an AMD Radeon™ R9 290 and power supply bundle starting at €379.95Get it!
CaseKingSave big when you purchase an XFX brand AMD Radeon™ graphics card and a Leadex power supplyGet it!
CSLGet a complete AMD Radeon™ R9 280X-based gaming PC from just €1099Get it!
UlmartMultiply retailer bonuses by five with the purchase of a Sapphire Radeon™ R9 290XGet it!
DanteGet a great deal on a Sapphire Radeon™ R9 295X2 and a Dell display!Get it!
Flipkart

CHOOSE ONE:
Buy any AMD Radeon™ R9 Series graphics card and get select games free.

Buy any AMD Radeon™ R9 Series GPU and get a chance to win a free gaming headset, mouse, keyboard and mousepad!
Buy any AMD Radeon™ R9 Series GPU and get 20% off any PC game. Offer valid for 90 days.

Get 'em!

As we head into the dog days of summer, EA wants to give thanks to all their loyal players and would  like to do that with a big ol’ AMD Radeon™-filled program called Battlefest! The program kicked off on July 9th, so you should get in on the action immediately after giving the below details a read!

battlefest.png

Here's what you need to know about Battlefest:

  • From July 9th through August 13th, there will be a daily contest called “Battleshots.” EA will ask you to send a screenshot in Battlefield 4™ based on a theme of their choosing. Screenshots get submitted here. Each day, they will crown a winning screenshot that will win an AMD Radeon™ graphics card, a DICE store gift card, and a Battlefield 4™ Premium membership on the platform of your choosing. (See official rules.)
  • Each Friday, the Battlefield™ team will be releasing a free camo unlock for all players.
  • To kick off the program on July 12th-13th there was a double XP weekend!
  • Each week of Battlefest will feature a global community challenge to reach an in-game goal. If the global BF4 community meets the goal, everyone gets a gold Battlepack. The first Community Mission begins July 15 with a challenge to reach 15 million revives by July 20. Good luck, soldiers!
  • Last but not least, the Stunt Video Competition runs July 14 through August 2. We want you to send us your best stunt video that can only be done in Battlefield 4™. The DICE team will pick the top 12 winners and then you, the loyal fans, will vote on the top three winners to receive a screamin’ fast AMD-based PC valued at $3100 US! The nine runners up won’t go home empty-handed, either: each one will receive a high-end AMD Radeon™ GPU. (See official rules.)

That’s it! A month of “thank you” to everyone. Keep your eyes on the Battlefield™ Blog for even more Battlefest prizes and announcements in the weeks ahead!


Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

rhallock

What is Project FreeSync?

Posted by rhallock May 29, 2014

accompanying-photo.jpg

Take a look at this photo. See that horizontal break halfway down?1 That break is called a “screen tear.” You’d never tolerate it in your still photographs, but screen tears are a constant torment for PC gamers. They come and go in a flash, but that’s enough to annoy.

 

Vertical synchronization, or v-sync, is the traditional solution to screen tearing, but it introduces its own problems. Project FreeSync helps solve tearing without those problems or the use of proprietary technology. Project FreeSync what gamers have been waiting for, but its benefits go beyond gaming.

 

MEET PROJECT FREESYNC

Computer monitors are refreshed at a constant rate, usually 60 times per second. On the other hand, game framerates are sporadic: the computer draws frames as fast as it can, and that varies constantly. Meanwhile, normal video content plays back at a steady rate, usually 23.976 frames per second.

As you can see, your content and your monitor are never in complete sync. That’s what causes the screen to "tear:" the monitor is being fed a new frame before it’s finished drawing the last one. For a variety of reasons, games are the worst offender.

 

Traditionally, we solve the gaming problem with vertical synchronization, or v-sync. When v-sync is on, the computer lets the monitor set the pace. The PC delivers frames only at intervals that fit the monitor’s refresh rate exactly; the content and monitor are now synced and the tears are gone.

But there’s a catch: when the game action picks up and your PC's framerate dips, the monitor may not receive a new frame from the GPU in time for its next refresh, so the monitor displays the current frame a second time. Where you might have had a tear in the picture with a higher framerate, now you have stuttering or lag. It's a short lag, but it's obvious and intolerable to many gamers. There are even ways to alleviate the stuttering with v-sync, but these methods introduce "input lag," or a delay between the time the player moves the mouse and the movement appears on-screen. These scenarios demonstrate a traditional wisdom that every attempt to fix the basic problem of "tearing" introduces problems of its own.

 

But there is a solution that upends traditional wisdom: allow the monitor's refresh rate to vary (e.g. 9-60 times per second), and let that refresh rate be controlled by and synchronized to the graphics card. That very ability was proposed by AMD to VESA, the standards body that oversees the DisplayPort specification. Our proposal was accepted and integrated into the DisplayPort 1.2a specification as a feature going by the name of "DisplayPort Adaptive-Sync."


Thanks to AMD's help, monitors that support the DisplayPort Adaptive-Sync specificationand there will be a lot of themwill feature dynamic refresh rates. To actually utilize the features of such a monitor, however, you need a graphics card and a graphics driver that can leverage the Adaptive-Sync feature to manage how the content and monitor are synchronized.

Project FreeSync is AMD's name for the complete solution: a compatible AMD Radeon™ graphics card, an enabled AMD Catalyst™ graphics driver, and an Adaptive-Sync-aware display. Together, these three pieces will abolish tearing, eliminate stuttering, and greatly reduce input latency.

 


Jay Lebo is a Product Marketing Manager at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

 

FOOTNOTES:

1. This image is a simulation.

Mantle has consistently proven itself in a number of games and engines, to the extent that low-overhead APIs were one of the hottest topics at the 2014 Game Developer Conference. Microsoft announced DirectX® 12, a “console-like” iteration of their famous API that promises to streamline development and address programming overhead. Others talked low-overhead OpenGL™, and the practices that might need to be adopted to get there.

 

It’s important to highlight that AMD was an essential voice in both of these discussions, and the chronology plainly demonstrates that Mantle has been highly influential to both the theme and the existence of these discussions. Naturally, we are 100% behind any decision that provides the benefits of low-overhead game development accessible to more gamers and developers.

 

As DirectX® 12 games sits about 20 months away by Microsoft’s estimation (“holiday 2015”), there exists a long  period of time from today where game developers must prepare their studios for a future when all major graphics APIs seek to extract the same sort of benefits that Mantle has pioneered. As the industry’s only proven low-overhead API for PC graphics, Mantle stands ready and waiting to address that gap.

 

Beyond that point, we expect DirectX® 12 to be every bit the robust and powerful solution Microsoft has promised it will be. We know that because we, too, are a member of the consortium Microsoft assembled to help shape this and every other version of their API since the 1990s.

 

When DirectX® 12 lands in late 2015, millions of AMD Radeon™ products based on the GCN Architecture will be compatible on day one. Thanks to Mantle and our presence in the console space, AMD will also stand alone with a graphics architecture that has received years of attention from developers working with low-overhead graphics APIs.

 

Above all, Mantle will present developers with a powerful shortcut to DirectX® 12, as the lingual similarities between APIs will make it easy to port a Mantle-based render backend to a DirectX® 12-based one if needed or desired. In addition, Mantle developers that made the bold decision to support our historic API will be well-educated on the design principles DirectX® 12 also promises to leverage. Finally, we will ensure that tomorrow’s game engines have an easy time of supporting a Mantle render backend, just as talented devs are comfortable with supporting multiple backends today to better address the needs of gamers.

 

port_times.PNG.png

 

IN CONCLUSION

Over the last seven months, we have been quite transparent about the origins of Mantle rooted in requests from developers, the problems we hope to solve with Mantle, and the effect it has had on this incredible industry. In our communications, even within this very blog, we’ve also been open and honest about the nature of our data and the areas we’re still actively addressing to make Mantle an even better solution for problems in game development. And today, we’ve shared with you our vision for the future of graphics, along with Mantle’s place in that future.

 

We heartily welcome discussion and analysis of the nature of Mantle in ways that comprehensively and accurately consider both CPU-bound and GPU-bound scenarios. We also invite inquisitive and philosophical investigation into why Mantle’s adoption has been so rapid, why Mantle is gaining traction amongst the largest and most experienced development studios, and how Mantle has shaped the direction of the graphics industry as a whole.

 

Whatever the future ultimately holds, we at AMD are simply proud the industry is joining us in making faster hardware through smarter software. That was our prime philosophy since the day game developers came to us—as they did each hardware vendor—asking us for a better way. We’re glad others respect that philosophy, too, and we can’t wait to put our GPUs to work in support of that mission wherever it may go.


Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

rhallock

The performance of Mantle

Posted by rhallock May 28, 2014

Mantle is an API made in service to the game development industry, optimized to handle the performance challenges most often encountered by developers. A key challenge for these developers is engine performance that has been constrained by poor multi-core scaling or processing overhead, particularly in scenes with a large number of objects.

 

The performance benefits of Mantle are very important to industry titans like DICE and Crytek. Figures 1 and 2 reflect the efficiency gains that have captured their attention.


mantle_in_thief.png

FIGURE 1: The performance of the Mantle graphics API is extracting untapped performance from existing hardware by removing bottlenecks in CPU-bound scenarios.

 

Figure 1 shows the built-in benchmark mode for Thief, which is designed to deliberately pressure systems with a high number of draw calls including characters, weather, carts, stalls, reflections, complex shadowing and many more objects.

 

Behind the scenes, each object represents a “draw call,” or a moment in time when the CPU and GPU must communicate to put something on the screen for your enjoyment. Historically, the quantity of draw calls—the image quality and detail provided to you—has reached a software limit before the hardware limit. The money you are investing in powerful hardware has been hamstrung by software inefficiencies!

 

Mantle is specifically designed to address this case by significantly raising the draw call limit by up to 900%.1 While increasing the draw call limit does not necessarily yield an equivalent jump in FPS, the data in figure 1 certainly demonstrates big performance gains can be achieved when you allow for better parallelization.

 

HARDCORE GAMERS: A LOOK AT MANTLE & MULTI-GPU

Moving on to multi-GPU platforms, we enter into an area where hardware has been even more constrained by software, as limited multithreading capabilities must now be stretched thin across two graphics cards hungry to get data and do work—even at 1600p!

 

Visiting the “Angry Sea” mission in Battlefield 4™ with this configuration demonstrates a large performance delta between DirectX® 11 and Mantle, even when one of those graphics cards is using a driver allegedly tuned to improve performance by reducing driver overhead in DirectX® 11.


mantle_mgpu.png

FIGURE 2: The data reveals that Mantle better equips a processor to feed a hungry dual-GPU configuration than DirectX® 11.

 

We would be remiss if we didn’t put a fine point on this and remind you that this performance disparity represents a squandering of the money you invested in your hardware. Mantle isn’t just a way to increase detail or performance--it’s a return on your investment as a gamer.


ON THE TOPIC OF ROI

Another interesting trend arises from the data, in that the low-overhead benefits of Mantle are evidently unlocking the true performance of processors across the board, allowing contenders at very different prices to churn out approximately equal performance regardless of their retail cost. The importance of this trend, when extrapolated to an industry now focused on low-overhead APIs, cannot be understated.

 

mantle_cpu_prices.png
FIGURE 3: The AMD FX-8350 is $850.99 less expensive than the Intel Core-i7 4960X, but it’s faster in Thief, a game equipped with the Mantle graphics API.2

 

Consider the implications of a new landscape where the budgetary choices you make for your PC have been democratized by software that totally deemphasizes the importance of your processor decision (and, by extension, the corresponding motherboard).

 

What would that do to the cost of your system when low-overhead APIs like Mantle become the norm? Would you purchase a less costly CPU and a more powerful graphics card instead? Would you simply reduce the cost of your system, perhaps by several hundred dollars? Little has been discussed on this topic, but we invite you to consider it in greater detail in your communities and articles.

 


Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.


FOOTNOTES:

  1. Testing performed at AMD Labs by isolating API CPU performance.  “Draw” defined as the minimum unique entity that can be rendered by an API draw command and a typical unique state associated with it.  Mantle performed an average draw of 0.36 microseconds over two CPU threads.  DirectX 11 performed an average draw of 3.89 microseconds over a single API thread and a single driver thread.  Mantle results discounted by 20% for conservatism (i.e. 3.89/0.36/1.2 = 9.00).    Test configuration:    Intel Core 2 CPU X9650 at 3GHz, 4 GB of PC2-6400 RAM, AMD Radeon HD 7970 video card with 3 GB VRAM. [MAN-36]
  2. Pricing data obtained from Newegg.com on 05 May, 2014. Intel Core i7-4960X ($1049.99). AMD FX-8350 ($199.99).

We’ve said much about Mantle’s goals and merits of late, but now it’s time to listen directly to the brilliant people who are actually in the business of making games.

 

Dan Baker, a partner at Oxide Games and the former graphics lead for Sid Meier’s Civilization® V, knows a thing or two about driver overhead and graphics APIs. In an interview conducted by MaximumPC regarding their Mantle-enabled “Nitrous” engine, he explained that Mantle is a cure for an industry that is in need of greater parallelization.

 

[…]APIs are still designed in this functional threading model where you have a series of processes that pass work back and forth to each other. The idea is that you have say, one thread for rendering, one thread for audio, one thread for gameplay, etc. This is really not a scalable way to build things,” Baker said.

 

“In situations where you have a shared L3 cache, you also create contention from all the different processes running, since they all access completely different memory. The industry continues to move to a job-based setup, where we have lots of tiny jobs that run asynchronously. This can now scale to a large number of CPUs, and we can fill up most of the previously unused time where one of the processors isn't doing something.”


On the topic of driver overhead, Baker’s insights were also particularly enlightening, noting that his team has been “completely limited in what [they] could do by driver overhead problems.” With Mantle, however, his team rapidly discovered that Mantle is such an elegant solution that it “dwarfs” the Direct3D 11 performance they could achieve in their engine with any hardware vendor.

 

The team at Firaxis, authors of the upcoming Civilization: Beyond Earth, unequivocally voiced the same opinion in a recent blog on Mantle: “Simply put, Mantle is the most advanced and powerful graphics API in existence.  It provides essentially the same feature set as DX11 or OpenGL, and does so at considerably lower runtime cost.”


For game developers, who live and breathe time-to-market pressures on their titles, Mantle has an added benefit. Mantle is their only opportunity to spend less time “tricking the system” to overcome software limitations, and more time getting on with the business of designing cool stuff for gamers. Our development partners have praised Mantle for reducing developmental complexity, which cannot always be said for API extensions or laborious code optimization efforts.

 

A notable proponent of this philosophy is Chris Roberts, CEO of Cloud Imperium Games and the brains behind Kickstarter sensation “Star Citizen.” In announcing support for Mantle, he noted that the API was key for achieving his vision without fighting the software to get there.

 

"AMD's Mantle will allow us to extract more performance from an AMD Radeon GPU than any other graphics API," Roberts said. "Mantle is vitally important for a game like Star Citizen, which is being designed with the need for massive GPU horsepower. With Mantle, our team can spend more time achieving our perfect artistic vision, and less time worrying about whether or not today’s gaming hardware will be ready to deliver it."


Firaxis also had something to say on this topic, noting that Mantle’s thinner abstraction layer empowers them to make better-informed game development decisions.

 

“The Mantle API is able to be backed by a very small, simple driver, which is thus considerably faster,” Firaxis said in their blog.  “It also means that this work, which must still be done, is done by someone with considerably more information.  Because the engine knows exactly what it will do and how it will do it, it is able to make design decisions that drivers could not.”


Dan Baker has a related philosophy, noting in his opening remarks (figure 1) at GDC14 that Mantle addresses fundamental development challenges that cannot addressed by a retrofit of an existing API.

 

mantle_retrofit.png

FIGURE 1: Dan Baker of Oxide Games said it plainly when he presented this slide at the 2014 Game Developer Conference: you can’t retrofit old APIs.


Baker continued this line of thinking in a recent blog, saying: “[…] many of the most experienced developers, Oxide included, had for years advocated a lighter, simpler API that did the absolute minimum that it could get away with. We believed we needed a teardown of the entire API rather than some modifications of current APIs.”

 

Johan Andersson, technical director of the Frostbite engine at DICE, has also praised Mantle for making development easier. That was the central theme of his keynote presentation at the APU13 developer conference late last year, which opened with exactly that sentiment (figure 2).

 

mantle_johan.png

FIGURE 2: An opening slide from Johan Andersson's keynote presentation at the AMD APU13 developer conference.


In review, it is evident that Mantle is addressing a clear need within the industry to reimagine or reinvent the graphics API, and to flush out tired problems that have long stifled game development. Together, AMD and top game developers are collaborating not only to undertake that effort, but to share the results widely throughout the gaming industry so that gamers of every stripe might ultimately benefit.

 

 


Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

rhallock

Mantle 101

Posted by rhallock May 28, 2014

In the six months since Mantle’s January launch, it has quickly grown to be incredibly successful: seven game developers have pledged support, four game engines have adopted and 20+ games will be Mantle-based. Within those figures, Crytek and AMD recently announced Mantle support in Cryengine, and AMD joined forces with 2K Games to bring Mantle support to Sid Meier’s Civilization®: Beyond Earth™. In addition, this month marked the start of the private beta program for the Mantle SDK, which boasts another 40 developers committed to exploring the benefits of our revolutionary API.

 

With all this momentum for Mantle, we thought it would be a good time to look forwards, backwards and sideways at Mantle to give a comprehensive view of  how and why it has achieved overwhelming industry praise. Let’s start, however, by looking at how Mantle reclaims lost performance for gamers.

 

With a basic implementation, Mantle was designed to improve performance in scenarios where the CPU is the limiting factor (so-called “CPU-bound” cases). CPU-bound scenarios are commonplace in gaming, as existing APIs are laden with heavy validation overhead, and have difficulty scaling out to multiple CPU cores. By addressing these problems, games developed with Mantle improve the experience for the majority of global PC gamers that have entry-level and mid-range processors.

 

Mantle achieves this through:

  • Low-overhead validation and processing of API commands
  • Explicit command buffer control
  • Close to linear performance scaling from reordering command buffers onto multiple CPU cores
  • Reduced runtime shader compilation overhead

 

Mantle is also designed to improve situations where high resolutions and “maximum detail” settings are used, although to a somewhat lesser degree, as thess settings tax GPU resources in a way that is more difficult to improve at the API level (so-called “GPU-bound” scenarios). While Mantle provides some built-in features to improve GPU-bound performance, gains in these cases are largely dependent on how well Mantle features and optimizations are being utilized by the developer. Some of those features include:

 

  • Reduction of command buffers submissions
  • Explicit control of resource compression, expands and synchronizations
  • Asynchronous DMA queue for data uploads independent from the graphics engine
  • Asynchronous compute queue for overlapping of compute and graphics workloads
  • Data formats optimizations via flexible buffer/image access
  • Advanced Anti-Aliasing features for MSAA/EQAA optimizations

 

For even more detail, we recently published our first whitepaper on Mantle. This 11-page brief contains essential technical information on the form and function of the Mantle graphics API. In addition, you might also read these recent blogs by Oxide Games and developer Josh Barczak, which detail some specific and significant ways Mantle is improving their development experience.

 

Altogether, these mechanisms have proven unquestionably attractive for a legion of game developers, to the extent that the first-year adoption rate for the Mantle API is projected to exceed the adoption rate of DirectX® 11 (see fig. 1 below).

 

api_adoption.PNG.png

FIGURE 1 - Industry interest in a picture: the number of games in development with Mantle support through Q1 2015.

 

We’re thrilled to see so many industry luminaries in active development with Mantle in its beta phase, as these studios have a vested interest in making the ideal, high-performance API for PC graphics. Throughout this process, we are discovering new opportunities to reduce inefficiency, and we’re evolving how we make better use of the technologies we have on-hand today.

 

As the famous lyrics go: “you ain’t seen nothin’ yet!”

 


Robert Hallock does Technical Communications for Desktop Graphics at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

 

FOOTNOTES:

  1. http://en.wikipedia.org/wiki/List_of_games_with_DirectX_10_support
  2. http://en.wikipedia.org/wiki/List_of_games_with_DirectX_10_support
  3. AMD internal estimates

NOTE: This blog has been reprinted with permission from Firaxis Games. This blog originally appeared on the Firaxis dev-blog network on 28 April, 2014.

 

What is Mantle?

A “Graphics API” (Application Programming Interface) is a protocol that rendering engines use to send commands to a GPU (Graphics Processing Unit).  The API provides an abstract set of commands like “draw” which are translated by a GPU driver into commands which a particular device can understand.  At present, the two most well-known graphics APIs are DirectX and OpenGL.  DirectX is dominant on Windows, and OpenGL is dominant on many other platforms.

 

Mantle is a new graphics API developed by AMD, and supported on all newer AMD devices beginning with the Radeon HD 7000 series.

 

What is important about Mantle?

As game developers, we want to maximize our products’ reach while minimizing our development costs.  Why then, would we spend a great deal of time and effort in something that would benefit only a subset of our user base?  The idea of a platform-specific API, while not unheard of was not often implemented.  After all, why would anyone write their application twice, when they could write it once?

 

In software, the only numbers of significance are 0, 1, and N.  Every cross-platform graphics engine that we have ever worked with has been designed around some kind of API abstraction which separates the game code on top from the graphics platform on the bottom.  If the abstraction layer is well built, then the cost of maintaining two graphics platforms is not worse than the cost of one.  It is also important to understand that, with the right architecture, graphics APIs are essentially a fixed cost.  Mantle has required an up-front investment, but the cost for future products to continue offering it will be considerably lower.

 

Because Mantle is so new, and so different, the development cost is higher than normal.  In order to understand why it’s worth it, you need to understand just how important Mantle is.

 

What does Mantle buy you?

Simply put, Mantle is the most advanced and powerful graphics API in existence.  It provides essentially the same feature set as DX11 or OpenGL, and does so at considerably lower runtime cost.

 

The conventional wisdom in real-time rendering is that batches, or “draw calls” are expensive.  On the PC, with current APIs, this notion is firmly rooted in fact.  This is a problem that has plagued engine and driver design since at least the DX9 era, and a large body of real-time rendering tradecraft is motivated by it (instancing, state sorting, texture atlasing, texture arrays, “uber-shaders”, to name a few).  Civilization, it turns out, requires a significant amount of rendering to generate our view of the world, and that in turn means we are required to make many, many more draw calls than you might expect..  Our birds’ eye view of the world means that we have a lot more “stuff” on screen than is typical, and our UI (a rich source of draw calls) is considerably more complex than the average.

 

Mantle changes things by working at a lower level than its competitors.  Much of the work that drivers used to do on an application’s behalf is now the responsibility of the game engine.  This means that the Mantle API is able to be backed by a very small, simple driver, which is thus considerably faster.  It also means that this work, which must still be done, is done by someone with considerably more information.  Because the engine knows exactly what it will do and how it will do it, it is able to make design decisions that drivers could not.

 

Besides being more efficient, core per core, Mantle also enables fully parallel draw submission (this has been attempted before, but never with the same degree of success). Until now, the CPU work of processing the draw calls could only by executed on one CPU core.  By removing this limitation, Mantle allows us to spread the load across multiple cores and finish it that much faster.

 

All of this means that Mantle has, quite literally, reduced the cost of a draw call by an order of magnitude.  This is an amazing technical achievement and difficult for us to exaggerate the importance of this savings.  It is a disruptive technical development which will have far-reaching implications for PC gaming.  It will alter the dynamics of the market.  It will re-write portions of the real-time rendering book.  It will change the design of future APIs and engines and greatly enhance their capabilities.

 

What does this mean to the player?

By reducing the CPU cost of rendering, Mantle will result in higher frame rates on CPU-limited systems.  As a result, players with high-end GPUs will have a much crisper and smoother experience than they had before, because their machines will no longer be held back by the CPU.  On GPU-limited systems, performance may not improve, but there will still be a considerable drop in power consumption.  This is particularly important given that many of these systems are laptops and tablets.  The reduced CPU usage also means that background tasks are much less likely to interfere with the game’s performance, in all cases.

 

Finally, the smallness and simplicity of the Mantle driver means that it will not only be more efficient, but also more robust.   Over time, we expect the bug rate for Mantle to be lower than D3D or OpenGL.  In the long run, we expect Mantle to drive the design of future graphics APIs, and by investing in it now, we are helping to create an environment which is more favorable to us and to our customers.

 

What about these other vendors?

At present, the benefits of Mantle extend only to those customers which can run it.  We recognize that a large fraction of our customers will not have access to Mantle, and we do not intend to discriminate.

 

Our philosophy is to strive to use our customers’ machines to their fullest potential.  To the extent possible, DirectX customers will see the same images as Mantle customers, and we will provide DirectX customers with the highest performance that their systems are capable of.   It is precisely this motivation which impels us to offer Mantle to those customers who can use it, because their machines possess great untapped potential.   By tapping that potential, we hope to drive positive changes which will eventually spread to all of our other customers.

 

We expect that future graphics APIs will follow Mantle’s lead, and become much lower-level, out of necessity.  There is nothing preventing other vendors from following AMD’s example and offering low-level access to their own hardware, and we are perfectly willing to support such efforts.  One API is clearly better for us than many, but if having many allows us to maximize performance across the board, then that is where the future will take us.

 

In the irreverently paraphrased words of Sir Winston Churchill:

“If we can standardize it, all drawcalls may be free, and the life of the gamers may move forward into broad, sunlit uplands”.

 

That, dear friends, is why “I Am Mantle.”

 

Joshua Barczak and John W. Kloetzli Jr. are the Lead Graphics Engineer and Principal Graphics Programmer, respectively, for Sid Meier’s Civilization®: Beyond Earth™ at Firaxis Games. This posting contains their own opinion(s) and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

With the extreme popularity of AMD Radeon™ R9 Series graphics cards over the past few months, some gamers have found it hard to get their hands on these great products at suggested list prices.

 

We have good news! Supply has now caught up with demand, and for most retailers, street prices now match the original suggested list prices on the AMD Radeon™ R9 Series graphics cards.

 

If you’re unsure whether now’s the time to purchase an R9 Series graphics card, you needn’t worry. We’ve worked hard to make sure there are plenty of AMD Radeon™ graphics cards available in the market at suggested manufacturer prices for everyone to get one. Armed with Mantle, AMD TrueAudio technology and the recently refreshed Never Settle Forever game bundle, the AMD Radeon™ R9 Series graphics cards offer the performance and value in every segment from $179.99 to $1499!1

 

In addition to terrific availability and pricing, don’t forget these other great reasons to buy AMD Radeon™ graphics now:

  • AMD’s Mantle2 is a groundbreaking graphics API that promises to transform the world of game development to help bring better, faster games to the PC. For example, Mantle increases Battlefield 4™ performance by up to 23.8%.3 All AMD Radeon™ R9 Series GPUs include Mantle support.
  • AMD TrueAudio technology4 is available on the AMD Radeon™ R9 290, R9 290X and R9 295X2 graphics cards, and gives sound engineers the freedom to follow their imaginations and the power to make their games sound as convincing as they look. Hear a demonstration, or discover the significance this technology holds in a stealth-based game title like Thief.
  • The AMD Gaming Evolved Client powered by Raptr brings your games together in one place and through a crowd-sourcing approach adds one-click optimizations, game progress and system performance tracking, social sharing tools, easy driver updates and a steady stream of Rewards points just for playing. Join over 5 million other gamers and install it now.
  • Our Never Settle Forever game bundle lets you choose the games you want most from a large selection of AMD Gaming Evolved titles.. Get more details, including where to buy, right here.
  • AMD Eyefinity Technology5 makes it easy to achieve gaming areas way beyond HD by combining multiple monitors. You haven’t experienced real game immersion until you’ve gone beyond the boundaries of a single display.
  • Graphics Core Next architecture, which includes technologies like AMD PowerTune Technology and AMD ZeroCore Technology6, gives you the processing horsepower you need when you need it while conserving power and reducing temperatures and noise the rest of the time. All AMD Radeon™ R9 Series GPUs include these features.


And here’s a look at what press are saying:

 

  • “Overall, a great bundle update. I particularly like the addition of indie titles – kudos to AMD for that move.” – Rob Williams, Techgage
  • “Never Settle Forever continues to be a compelling choice offering incentives to potential AMD GPU buyers, and it’s nice to see AMD showing the indies — and their newer graphics cards — some love.” –Jason Evangelho, Forbes
  • “The pool of choices is bigger than ever and includes a couple of notable additions, including the upcoming Murdered: Soul Suspect and Thief. Also new are four prominent indie games...” –Scott Wasson, The Tech Report
  • “AMD's new Radeon R9 290 delivers quite impressive performance numbers. Right now, the R9 290 has the best price / performance ratio in the segment.” –"W1zzard," Techpowerup
  • “The Dual-X R9 280 OC plowed through our in game testing with great results making it a great choice for someone who is still on a limited budget and is gaming at 1080p.” –Wes Compton, LanOC Reviews
  • “In the end, this is the go-to card for ultra settings at 1080p, no question … If the performance delta isn’t enough to sway you, there’s word that the Never Settle Forever game bundle will be coming to the 200-series cards soon, too, making this card almost irresistible.” – Josh Norem, Maximum PC

 

There couldn’t be a better time to upgrade to the R9 Series of AMD Radeon™ graphics. It’s for gamers who demand the best.


Jay Lebo is a Product Marketing Manager at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

 


FOOTNOTES:

  1. Prices for AMD Radeon™ R9 Series cards on Newegg.com as of May 12, 2014
  2. Application support for Mantle is required.
  3. In playing Battlefield 4™ as of Feb 6, 2014, the AMD Radeon™ R9 290X GPU paired with an AMD FX-8350 APU saw an increase in frame rates from 52.90 to 65.5 frames per second with Mantle at 1080p, ultra detail settings, anisotropic filtering and anti-aliasing on. 2TB HDD, 4GB memory, AMD Catalyst 13.11 Beta 6 Performance Driver. MAN-3
  4. AMD TrueAudio technology is offered on select AMD Radeon™ R9 and R7 200 Series GPUs and is designed to improve acoustic realism. Requires an enabled game or application. Not all audio equipment supports all audio effects; additional audio equipment may be required for some audio effects. Not all products feature all technologies—check with your component or system manufacturer for specific capabilities.
  5. AMD Eyefinity technology supports up to six DisplayPort monitors on an enabled graphics card. Supported display quantity, type and resolution vary by model and board design; confirm specifications with manufacturer before purchase. To enable more than two displays, or multiple displays from a single output, additional hardware such as DisplayPort-ready monitors or DisplayPort 1.2 MST-enabled hubs may be required. A maximum of two active adapters is recommended for consumer systems. See www.amd.com/eyefinityfaq for full details.
  6. AMD PowerTune and AMD ZeroCore Power are technologies offered by certain AMD Radeon™ products, which are designed to intelligently manage GPU power consumption in response to certain GPU load conditions. Not all products feature all technologies – check with your component or system manufacturer for specific model capabilities.

Filter Blog

By date:
By tag: