I guess this can be considered a kind-of follow up of Abysmal OpenGL performance (RX480)
Basically.. I tried the my testcase on the following systems:
I believe I don't need any further explanation.
My educated guess is that gl commands aren't dispatched to a separate thread.
Then I would have liked to give some more info, but I had problems with both CodeXL and PerfStudio .
Instructions to use the thing shouldn't be any different from those contained in this last link.
One of our GL driver engineers is looking into this report.
Here are a few initial observations.
I ran the application with Crimson 16.9.2 on a windows 10 x64 machine with an RX480 + i7-6700K and was getting around ~54 FPS.
What version of the driver and what OS are you running ?
I was also able to capture performance with CodeXL and PerfStudio if ran GSDumpGUI.exe with the following command line arguments.
E:\Work\AMD\pcsx2\bin\plugins\GSdx32-SSE2.dll E:\Work\AMD\Community\perf-case.7z\gsdx_20160924182111.gs GSReplay -1
The obvious hotspots here included.
I will try a few more configurations and let you know what else I find.
Just to elaborate slightly on this (as I am one of the developers for PCSX2), this performance drop is consistent across the entire AMD range, regardless of computer specs.
OpenGL performance is usually roughly half that seen on DX11 using the same card/setup.
On Nvidia cards the performance of OpenGL vs Dx11 is about the same, sometimes it is 1-2% slower in OpenGL, but generally is the same speed.
So there is certainly an issue with the driver, one of our guys who works with making hardware for a living, also works on GSDX, said the OpenGL driver seems very single threaded, where Nvidia have a multithreaded driver for OpenGL, this wasn't obvious until he enabled the multithreaded support on GSDX when initialising OpenGL, that is when the gap between the card manufacturers appeared.
Is there any update on this at all?
One of our developers found an optimization to the OpenGL Program Pipeline implementation. It should get rolled into a release soon.
This was in Octoboer. Still nothing.
Almost a year later, I managed to find out even another testcase.
Aforementioned AMD systems can only get ~3, ~20 and ~60 fps in each of the tests respectively (basically no matter the GPU)
The nvidia smartphone-sized PC can reach 6 (7 with multi-thread switch), 45 and 105 fps instead.
Up given these days engineers seem keen.
I believe I might know a source of the dramatic loss of performance compared to the competitor: Scroll to "Threaded Validation and Submission": OpenGL like Vulkan The Mesa driver on Linux attempts to do the same (spawn a thread dedicated to draw calls) and it also has about 30% higher performance under certain conditions.
Another reason is that even without that feature, NVIDIA is faster compared to AMD at OpenGL rendering.
The issue might be more apparent on renderers that are capping their CPU thread.
EDIT: I no longer believe that's the main contributor, see below.
The multithreading feature of other drivers appears to NOT be the main contributor of their better performance. Even if I turn that feature off on the Mesa driver, the performance of that open source driver remains about 30 to 40% better on renderers that are CPU hungry.
I know something similar is true on the NVIDIA driver on Windows if their threading optimization feature is turned off and confirmed there is no much CPU activity beyond the main renderer. Maybe AMD software has a simple design flaw that keeps it back.
Open driver might not be all those bells and whistles either, loosing even against the hated fglrx (in CPU-bound cases, but still that's quite much to say considering elsewhere it's way faster).
EDIT: that's due to a like 25% performance regression in the last months. Unsure about comparisons made with a fixed version.
Ping aaronhagan & dwitczak
Even if some native games are CPU bound, I wouldn't call their renderers necessary CPU hungry since it might be game logic being CPU bound. Try an emulator renderer like Citra's or Cemu's which ensures the renderer is on a CPU bound thread and you'll see a significant handicap on the AMD OpenGL driver of Windows compared to Mesa on Linux.
PS. Most native PC games are efficient enough at the system side of rendering to not be low FPS before that condition is met, so people don't even notice. But in those specific cases that the FPS remains low because of that condition, the OpenGL driver for Windows reveals that it's significantly inefficient.
Hello mirh, using the gl_vs_vk on AMD R9 Fury + latest driver, i can get the result ~5fps, ~30fps and ~90fps respectively. May I have the latest test result from your side, as well as the GPU/OS/driver info?
I just tested the GL_vs_VK tool on an Nvidia NVS 315 (pretty much a display adapter rather than a graphics card) using Windows 10 and The 391.03 Quadro driver.
These are the results of my test:
Test3: 32-53FPS (Fluctuates quite a bit)
Considering I beat the R9 Fury, a card which is vastly more powerful than this thing I'm using in 2 of the 3 tests by over 2x, that is an abysmal showing from the R9 card.
Same results of last time (for as much as the E8400 became a Q9505)
The 7750 is on windows 7x64 with latest 18.8.1
My very broadly educate guess is that you are massively getting cpu-limited.
Hi xhuang, sorry this NVS 315 doesn't support vulkan as far as I can tell so I am unable to test that on this machine.
Just for some additional test data, I had a friend test his AMD card to see what results he gets, they are as follows
GPU: Asus AMD R7 360
OS: Windows 7 64bit
Driver version: 18.8.1
Test 1: 18 fps
Test 2: 58 fps
Test 3: 148-192 fps
Test 1: 6 fps
Test 2: 41 fps
Test 3: 150 fps
As for myself, I can try it on my GTX 980Ti tonight to see what kind of performance numbers that gives.
Ok tested my 980Ti using the 397.93 drivers (CPU is an i5 4690k @ 4.3Ghz), here's the results, I would expect an R9 Fury to be at least 75% of these results.
Test 1: 25 FPS
Test 2: 91 FPS
Test 3: 298 FPS
Test 1: 35 FPS
Test 2: 153 FPS
Test 3: 1300-1800 FPS (and a lot of squeeling xD )
Thanks for you information.
We will investigate this issue. Most likely we will start with gl_vs_vk first. Will update you updated if we find something.
19.1.2 still doesn't show the slightest of improvements (in fact, I think I even lost 10fps in test 3)
Thanks for your patience, we're still working on this.
A couple of months have passed, we are now on 19.7.3 and shows no improvements whatsoever.
Are there any news/updates ?
Thanks for your mention, I will check it soon.
Here is another testcase from our side(pcsx2) showing worst case scenario where the performance drop is massive on amd. Don't know how helpful will this be but I thought I'd post it here anyway.
I created a topic with some test cases which might be related to this discussion:https://community.amd.com/message/2924860
I really hope this gets fixed the current situation with the Windows driver is not that good.
In fact it's a long time work, we need to find out 1. what may effect the performance. 2. Can we fix it without other side effect.
Thanks for your patience!
I do not know it's related but I have noticed a performance problems with some older games in D3D9 related to vertex processing.
Even in quite new game - Final Fantasy XIII - transferring a 358400 bytes vertex buffer kills performance on my old R7 360 and game is doing this in all the frames.
When I forced to change pool in IDirect3DDevice9::CreateVertexBuffer from D3DPOOL_MANAGED to D3DPOOL_SYSTEMMEM so the buffer stays in RAM - 60FPS vs 15FPS (tested in save menu - that vertex buffer contains vertices of menu elements like hand cursor etc).
It is possible that some common code in AMD driver responsible of vertex processing have a performance flaws.
I have a code for my tweaks in a wrapper on github here: GitHub - Nucleoprotein/OneTweakNG: OneTweak for all games with game performance fixes.
Something similar happens also in RE4 (not HD) and King Bounty The Legend - but for that games I change behavior flags in IDirect3D9::CreateDevice to D3DCREATE_MIXED_VERTEXPROCESSING - thats fixes them (>60FPS vs 30-40FPS in KB, 30FPS vs 15FPS in RE4)
I noticed a much improvement in radv:
radv: Align large buffers to the fragment size. - Patchwork
ie. patch to allocate VRAM as power of two, that was added also to drm/amdgpu for linux 4.20, maybe it benefit Windows too, dunno what you currently using in Windows drivers, but this is good point of start looking.
Another year, still no progress (at least on the windows side). Cmon guys it's been 4 years, if you've given up on OpenGL, please just tell us? I get OpenGL is considered "old" now, but it's still widely used in the emulation community at least and is far more functional than DX11, we hate having to recommend that for our users, but we have little choice. We gave you a great test case (with a midrange GPU of 2018 being shamefully outperformed by a display adapter), surely you could work on something from that?
I look forward to hearing an update.
If you need another example of the performance issues, this was taken from a youtube video (dunno which one, it got passed around a bit) so you can use this as another example of the terrible OGL performance on windows.