AnsweredAssumed Answered

Futuremark responds to accusations of bias in new DirectX 12 Time Spy test

Question asked by kingfish on Jul 21, 2016

Futuremark responds to accusations of bias in new DirectX 12 Time Spy test | ExtremeTech

"Last week, Futuremark released Time Spy, a new DirectX 12 benchmark that takes full advantage of DX12’s features and capabilities, including asynchronous compute. While the release of a new benchmark is typically of modest interest, there’s been a great deal of confusion, uncertainty, and doubt over Time Spy’s benchmark results and what those results mean. Futuremark has since published an updated and expanded guide to how the benchmark functions and what it’s designed to do.

Much of the confusion on this topic is related to what Time Spy tests and how it implements support for asynchronous compute in DirectX 12. A graph from PC Perspective’s test results from last week will illustrate the question:


"The Time Spy-related questions can be broadly summarized as follows:

  • Why does Nvidia’s Pascal architecture gain performance in Time Spy when it shows no performance gain from asynchronous compute in other benchmarks?
  • Why doesn’t Futuremark implement optimized, vendor-specific code paths for AMD and Nvidia? Isn’t this a functional requirement of DX12?

According to Futuremark, Time Spy uses a new engine specifically architected for DirectX 12. The benchmark was designed over a period of two years of active collaboration with Intel, AMD, and Nvidia, all of whom have had source code access and have contributed best practices and technical understanding. Furthermore, all of Futuremark’s partners have signed off on releasing the benchmark in its current form.

I’d like to note that this public explanation lines up with what we’ve heard privately. Neither AMD nor Nvidia’s PR teams are known for their reticence when it comes to attacking benchmarks they perceive as flawed or unfair, and neither company has anything negative to say about Time Spy.

Futuremark goes on to say it has considered implementing vendor-specific code paths, but that its partners are invariably against the practice. It writes:

In many cases, an aggressive optimization path would also require altering the work being done, which means the test would no longer provide a common reference point. And with separate paths for each architecture, not only would the outputs not be comparable, but the paths would be obsolete with every new architecture launch.

3DMark benchmarks use a path that is heavily optimized for all hardware. This path is developed by working with all vendors to ensure that our engine runs as efficiently as possible on all available hardware. Without vendor support and participation this would not be possible, but we are lucky in having active and dedicated development partners.
Ultimately, 3DMark aims to predict the performance of games in general. To accomplish this, it needs to be able to predict games that are heavily optimized for one vendor, both vendors, and games that are fairly agnostic. 3DMark is not intended to be a measure of the absolute theoretical maximum performance of hardware.

This statement caused some controversy in the user community because a joint AMD-Nvidia presentation at GDC 2016 prominently claimed that there was no point to implementing DirectX 12 unless you planned to also implement IHV-specific code paths."



Vendor-optimized paths are risky

Back in 2008, when I worked for Ars Technica, I wrote a review of the Via Nano. During the course of testing that CPU, I decided to use a VIA-provided utility to change the CPUID string that identifies the microprocessor. Most of the test scores didn’t change, but the memory subsystem score changed drastically.


The Nano (AMD) and Nano (Intel) labels mean that the chip identified itself as having been manufactured by AMD and Intel, respectively. The Intel code path is 47% faster than the default path.

Changing the CPUID improved Nano’s performance by 47% because a vendor-specific codepath had been implemented and certain optimizations had been tied to it. Futuremark always insisted that this was due to an accident rather than a deliberate attempt to skew benchmark results in favor of Intel. When Futuremark announced PCMark 8 I asked the company what had happened after the PCMark05 controversy. Futuremark informed me it had overhauled its developer programs and optimization strategies to avoid vendor-specific, hand-optimized code paths because of the fallout surrounding the PCMark05 issue.

It would be hypocritical in the extreme to attack Futuremark for using Intel-specific optimizations in one test, only to turn around and attack it for not implementing AMD or NV-specific optimizations in a different test. If I have to choose between a general-case, all-around fair test that doesn’t include vendor-specific optimizations for any architecture, and a benchmark that’s been optimized to an unknown degree by multiple vendors, I’ll take the former every time — even if it means missing out on seeing the absolute best-case scenario for any given GPU.

A program like Time Spy, Fire Strike, or 3DMark 11 is designed to serve as a general, representative vehicle for measuring performance in a given series of tests. Futuremark’s customer base isn’t limited to individual gamers. It also sells site licenses to other companies that want to measure their hardware’s general performance in a standardized benchmark. 3DMark versions also tend to have longer shelf lives than game benchmarks. Most reviewers refresh their game tests on a 1-2 year cycle, while 3DMark versions typically last three or more. Writing and updating a benchmark that performs decently well on multiple architectures without being specifically optimized for any single target may prevent any one company from showcasing a specific feature. But it also provides a framework that multiple companies can rely on for qualifying their own designs.