cancel
Showing results for 
Search instead for 
Did you mean: 

General Discussions

rainingtacco
Challenger

AMD when will you fix your terrible driver overhead for DX11?

You really want us to play newest games with DX12, dont you? 

52 Replies

@hitbm47  Radeon overlay, image sharpening, Chill, etc, are all universal features and the kinds of things that you can't only have work, for example, with Vulkan, because it just makes the features next to useless. Building something into the driver allowing it to intercept an excessive amount of calls to one thread and split them to different threads is not a universal approach it is something specific for DX11 only and thus why AMD deem it a waste of resources, or at least extremely low priority, despite the substantial improvements that could be gained in heavily single threaded DX11 situations. Yes and no regards DX11 command lists, you need hardware and software for it to work effectively AFAIK and in AMDs case from what I know they could never get it working properly driver side.

Hi @ketxxx 

I agree with what you are saying, I still wonder what the issue is, because some DX11 games perform well (or max out CPU usage at least) even on low IPC CPUs and other such as Far Cry perform horribly (max out only one core) if you do not have a relatively modern intel CPU.

My guess currently is that the only way to get command lists to work is to move more of the driver onto the CPU which increases CPU overhead and maybe on a weak quad core you would actually lose performance, but then on a weak eight threaded CPU you could've gained performance.

Maybe Nvidia's command lists only benefits high threaded low IPC CPUs in single-threaded cases (for example, 4core 8thread), this is something one will have to test, whereas AMD's driver might be slightly lighter on low threaded and low IPC CPUs (for example, 2core 4thread) since DX11 can still only submit draw calls through one or two cores, whereas DX12/Vulkan can submit on all cores.

Kind regards

0 Likes

@hitbm47  Sometimes the problem isn't with the driver at all the game engine and how well the developers have optimised it play a huge role as well. The Metro series is another good example to use here (its really the only example you need due to how the engine is optimised) the 4A engine is highly optimised for multi-threaded workloads, BUT, the engine does not perform well on AMD hardware (even my highly optimised and OCd 8GB RX580 got convincingly beaten by my much, much older GTX980 @1.45GHz core) either because 4A did not properly optimise for AMD or because AMDs DX11 driver level multi-threading capabilities are just that bad. Even with a higher CPU workload by offloading some things to the CPU for DX11 (and probably inherently increasing overhead - I'm not 100% on that) AMDs driver in DX11 would probably improve wholesale because when it sucks (which is heavily weighted to more often than not) it REALLY sucks. Theres actually an article and a review I wrote that have information in this regard to show what I'm talking about; https://www.win-raid.com/t4815f51-Article-Radeon-RX-GB-amp-GeForce-GTX-GB-AMD-and-nvidia-the-bottom-... and https://www.win-raid.com/t4830f51-PowerColor-RX-GB-Red-Dragon-Review.html

Hi @ketxxx 

I went through most of the first link you posted of your RX 580 vs GTX 980 and I think it is a very good article comparing the shortcomings between the two. It's a shame that all these years later @AMD has still not done anything it seems for DirectX 11 performance.

I took some screenshots of Deus Ex Mankind Divided in 720p high as well to demonstrate the DX11 performance, even more disturbing is the DX12 performance, and as you said it seem they simply wrapped DX11 to DX12 just to get improved MultiGPU support.

RX 480:

RX 480 DX11RX 480 DX11

GTX 1060 3GB:

GTX 1060 DX11GTX 1060 DX11

Even tried forcing max CPU load by setting my FX 8350 to 2modules 4ALUs and it proved my guess of the Nvidia driver wrong:

RX 480:

RX 480 Quad CoreRX 480 Quad Core

GTX 1060:

GTX 1060 Quad CoreGTX 1060 Quad Core

Kind regards

0 Likes

As I said in the article @hitbm47 it would be unfair to single out AMD for bad driver optimisation in Deus Ex MD as the game is built on a pretty **bleep**ty engine but it does show that intercepting an over abundance of thread calls on the driver level and offloading some to a new thread (as nvidias driver does) nets a decent performance improvement. You'd be able to show this better by running a test with all CPU cores/threads enabled for Deus Ex MD and Metro Last Light Redux then run the tests a few more times only with the CPU running on 6, 4, and 2 cores until you finally get to just a single core being active. You can plot with a reasonable degree of accuracy how efficient the AMD and nvidia drivers are when forced to run single thread and in multi-threaded situations this way. Of course downloading and configuring HWinfo64 and riva tuner statistics server for all the detailed info it can give would really help highlight these differences.

Hi @ketxxx 

Excuse me for only replying now, I must say that I think Deus Ex actually has an engine that scales pretty well, apparently it is one one of the Hitman Glacier engines, but it is very unfortunate that they and Total War Warhammer seem to simply use a DX12 layer on top of the base DirectX11 implementation to get to better MultiGPU scaling and not for CPU improvements. But recently I tried Human Revolution which ran on a different engine and seemed to scale even better.

Well I think the most important variable to be able to monitor is the amount of draw calls or polygon information sent to the GPU, which I do not know if HWInfo64 can do that? I have used it to monitor VRM temps on my motherboard and such. In addition, I can only disable cores per module/two at a time in my BIOS and using the affinity tool in Windows does not yield in the same results since background tasks can then still happen on free cores and possibly some of the context deferring.

Here is actually a link of how DICE improved performance on GCN which I do not know if it rivals Nvidia, but I have barely been CPU limited in any of the Battlefield 3, 4 or 1 games since they tend to utilize all cores before bottlenecking: https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf

Kind regards

0 Likes

@hitbm47  I'm not sure what you're getting at here my comment was about the build of the engine Deus Ex Mankind Divided used not scaling well with the CPU as you can see here; https://www.techspot.com/review/1235-deus-ex-mankind-divided-benchmarks/page5.html, nothing to do with the current build of the Glacier engine which as one would expect has been improved since Deus Ex MD but does still suffer from poor CPU scaling its just less pronounced due to much more powerful CPUs and GPUs brute forcing it; https://www.overclock3d.net/reviews/software/hitman_3_pc_performance_review_and_optimisation_guide/4

If the problem here in Hitman 3 is draw distance then its the crappy DX11 multi-threading issue with most (all?) calls being put in a DX12 wrapper rearing its head again. If not then the other explanation is a physics based one, I don't know what physics engine Glacier uses but for sake of hypothesising I'll guess its Havok being a CPU based physics the more cores/threads the better in theory, but with performance maxing out pretty much with 4 cores 8 threads the physics system itself possibly was only coded to utilise up to 4 cores and 8 threads... hence nerfing performance on any CPUs packing more substantial hardware than that. Its also entirely possible its something else.

No need to do any testing when those links do it all and show quite well whats been talked about. HWinfo64 I was saying to use with RTSS to accurately plot CPU usage on 2/4/6/8+ cores so you could see fairly accurately what impact a poorly threaded GPU driver has on an engine that heavily relies on multi-threading for both the GPU and CPU sides, which for the record the Metro Last Light Redux engine scales up to (and possibly beyond) at least 8 cores; https://www.tomshardware.com/uk/reviews/multi-core-cpu-scaling-directx-11,4768-4.html  that link shows how CPU cores impact performance with a GTX1080 so there is a baseline to work from to crudely judge how efficient AMDs DX11 multi-threading is. I'd guess its around 7-10% worse on average which is quite noticeable, moreso today than it once was due to the advent of extremely high refresh rate monitors people can now use what was once untapped as we aren't limited to 60Hz without screen tearing anymore.

Hopefully that all makes sense  its insanely late here.

0 Likes

Hi @ketxxx 

I was not trying to contradict you, and I fully agree with you that AMDs DirectX 11 multithreading is sub-par. I was only trying to point out that from what I could see with my FX 8350 and MSI Afterburner + RTSS is that DXMD mostly scales over available cores before it starts bottlenecking throughput in most cases from what I could see in DirectX 11 (whereas other DX11 games do not even max out a single core on the CPU before causing low GPU usage), but it does indeed scales more acceptably over Nvidia in DX11. In addition, the DX12 is very disappointing because it scales even worse on the CPU.

Thank you for providing those links, I am now more interested in testing that Metro on my system, since according to that article they programmed it in a way that would benefit the current poor AMD DX11 driver since they freed up one core to only to draw call submitting. I have indeed read a Critereon Interview that doing certain things on a single core improves latency / response times when a game is running at a certain frame-rate.

From my testing, since Hitman 2 (2018) in DX12 mode, IOI have most likely updated the engine to support multithreaded draw calls, since I get a considerable framerate boost CPU side in Hitman 2 that I did not get in Hitman (2016) in DirectX 12 mode. Furthermore, IOI uses Nvidia PhysX for the CPU physics calculations, I have noticed this a year ago already when I was looking through Hitman Absolution, Hitman 2016 and Hitman 2 2018 folders. You can also notice in-game that shooting in the floor leaves particles on the ground without disappearing, accept they are not movable afterwards due to not running on the Cuda core version of PhysX.

Also another thing I was getting at is that I wonder what other optimization AMD has in place for DirectX11 that most companies might not seem to be using, since for example, Battlefield 4 runs at 70-80 FPS (with dips to 60FPS) in Multiplayer with Ultra settings on my FX 8350 and RX 480 in DX11 and being impressive since this game has huge open world maps. Which is why I posted that documents of DICE with the optimizations for GCN.

In addition, most Call of Duty games scale incredibly well on my system in DX11 (except for Ghosts) which also raises some questions why for example Crysis 3 would dip to 19FPS specifically in Windows 10 on my system without maxing out any component in that said scene.

I have also recently narrowed down and again reported that AMD erroneously deleted all their DirectX 9 Unreal Engine 3 optimizations beginning with driver version 17.7.2 where they hastily removed "Radeon Additional Settings" from the drivers which clearly had some optimizations linked for games that went lost, you can see how it affects UE3 DX9 games here with a 64% decrease in minimum FPS performance: https://community.amd.com/t5/opengl-vulkan/unreal-tournament-3-rx-480-still-performs-horribly-can-it... . I have been reporting this issue for FOUR years to AMD and which is now very unacceptable with it not yet being fixed for Radeon support being ended now on GPUs which used to run these games flawless and was not able for the last five years without something like at least a i5 4670k, and now they will never receive the fix they without a doubt whole-heartedly deserve!

This has been affecting me for four years a few months after my purchase of the RX 480 and is deeply disappointing and unacceptable from AMD considering AMD sold low IPC CPUs at the time.

Edit:

Hi @ketxxx  so I was now able to test the Epic Games store Metro Last Light Redux on my RX 480 and FX 8350 and the game scales incredibly well on 17.7.1 drivers (before AMD erroneously lost UE3 optimizations and who knows what else) where I was averaging between 60-80FPS at 1080p (Very High Settings, with Motion Blurr & Tesselation on Normal and AF x16). Maybe this is something else AMD messed up with the removal of "Radeon Additional Settings" in 17.7.2 . Your RX 580 will be able to use 17.7.1 on as well if you want to re-test that since you were supposed to get even slightly higher FPS than me with that Ryzen you were using. I have not tested it on 21.6.1 yet since I am trying to see what else AMD messed up.

During the first part that build up to chasing the alien (or the dark one) I was getting 100-120FPS consistently with VSync disabled.

Kind regards

0 Likes

@hitbm47some of your issues might be due to Windows itself and the CPU scheduler it uses. Only way to know if thats part of the problem is to test your hardware on Win 7 as well, good luck getting AMD to fix DX9 problems as best I know they stopped maintaining DX9 in their driver some time ago and whatever DX9 optimisations in the driver that do still work will probably be torn out soon as any architecture pre RX400 series entered legacy status as of 21.6.1. I get this sort of thing happens eventually but RX400 series is GCN, theres no reason not to continue supporting older GCN based cards like the 7000 series not to do so is like ordering a steak and watching the waiter cut it in half and take it back to the kitchen "because reasons".

Nice one AMD, drop support for a bunch of GPUs right at a time nobody can buy them.

Hi @ketxxx 

With the CryEngine games we have definitely in the past confirmed it is due to using Windows 10 instead of 7, and setting compatibility to 7 on 10 does not help either. I have tried tinkering with timer settings in the past but it does not help, and Windows 10's timer seems better in general anyway.

RX Vega is also a revision of GCN as far as I know, but I agree 100% they have ended support in a very bad time, probably with the hopes to force new GPU purchases for some cash grabs, but it will likely backfire at the moment.

In my opininion there was not a good reason to end support for R9 300 series, but I can understand the R9 200 series since it was simply higher clocked HD 7000 cards with probably some thermal improvements. R9 300 was quite stronger from the statistics I saw.

Do you now of which AMD DirectX 9 DLL's from a older driver one could probably paste in a DirectX9 games folder. People have had success with mantle in this way, but usually you need a D3D9.dll proxy which I can't find in AMD's drivers. I tried these files but it was not being used: aticfx32.dll, aticfx64.dll, atidxx32.dll, atidxx64.dll

Kind regards

0 Likes

As for CPU exacerbating the AMD DX11 Problem.

Before Zen 3, AMD had really poor core-to-core latency. This can exacerbate the issue when faced with frequent context switching. Imagine that on the thread where draw calls are computed, there are also other compute heavy tasks. Since draw calls are computed on single thread in AMD driver, the driver[depending also on game engine] can switch the context from clogged thread to other less congested thread. But this comes with latency penalty. The switching can occur several times, before computation is complete. This is why AMD gpus had less "loading" stutter on intel cpus at the time in DX11 cpu heavy scenarios. The problem would not be so severe if the draw call pipeline was always offloaded to the nearest thread like from thread 0 to thread 1, but it's not always the case if thread 1 is also pegged. 

As you can see in Zen 2 switching between cores incurs 30ns of latency, with 7ns of latency if its between the same core but virtual thread[hyperthreading]. In contrast switching between CCX incurs massive latency and should be always avoided in critical computation pipeline like draw calls. 

Zen 3 improved the situation

And here's intel cpus[along ryzen too].

 

Intel has comparable inter core latencies, slightly lower than ZEN2, slightly higher than ZEN3, BUT the latencies are EVEN!. There's no problem between switching CCX and added huge latency malus. This is why when you compare ZEN 2 vs intel 9/10/11 series in AIDA latency benchmark you will see two times less latency for intel. Imagine that a game do this context switching between CCX like you usually will have with CPUs that have less than 8 logical cores for example -ZEN 2 Ryzen 5 under heavy CPU scenario with a draw call computation on single thread like with AMD DX11 driver. It's a disaster, hence AMD CPU exacerbate the issue of their poor drivers. This was VASTLY improved with ZEN+/ZEN2, compared to ZEN/bulldozer. Bulldozer and even ZEN was a complete disaster, hence poor performance in heavy cpu scenarios and unbearable stutter. Anyone who bought AMD CPU back then was completely duped few years later when multithreading became more common even with DX11, and AMD refused to improve their drivers. They were and still are in hot water, hence they pushed SO HARD Mantle, Vulkan and DX12. Back at bulldozer era even BIGGEST AMD fanboys admitted that it's better to run intel cpu and amd gpu. But the obvious solution back then and still today is ALWAYS to buy Intel and Nvidia, until AMD will clearly improve. This is especially true if you like to play DX11 games[especially high refresh rate] or oldies in DX9. 

Qwertydrive
Journeyman III

I am getting better FPS using dxvk in a lot of games rather then AMD's crappy windows dx11 driver

Sadly wrapper doesnt work in some cpu limited games in dx11 while using AMD gpu. Maybe some games cant be changed, or there are no api hooks available for wrapper to work.