I guess this can be considered a kind-of follow up of Abysmal OpenGL performance (RX480)
Basically.. I tried the my testcase on the following systems:
I believe I don't need any further explanation.
My educated guess is that gl commands aren't dispatched to a separate thread.
Instructions to use the thing shouldn't be any different from those contained in this last link.
Here are a few initial observations.
I ran the application with Crimson 16.9.2 on a windows 10 x64 machine with an RX480 + i7-6700K and was getting around ~54 FPS.
What version of the driver and what OS are you running ?
I was also able to capture performance with CodeXL and PerfStudio if ran GSDumpGUI.exe with the following command line arguments.
E:\Work\AMD\pcsx2\bin\plugins\GSdx32-SSE2.dll E:\Work\AMD\Community\perf-case.7z\gsdx_20160924182111.gs GSReplay -1
The obvious hotspots here included.
I will try a few more configurations and let you know what else I find.
Just to elaborate slightly on this (as I am one of the developers for PCSX2), this performance drop is consistent across the entire AMD range, regardless of computer specs.
OpenGL performance is usually roughly half that seen on DX11 using the same card/setup.
On Nvidia cards the performance of OpenGL vs Dx11 is about the same, sometimes it is 1-2% slower in OpenGL, but generally is the same speed.
So there is certainly an issue with the driver, one of our guys who works with making hardware for a living, also works on GSDX, said the OpenGL driver seems very single threaded, where Nvidia have a multithreaded driver for OpenGL, this wasn't obvious until he enabled the multithreaded support on GSDX when initialising OpenGL, that is when the gap between the card manufacturers appeared.
One of our developers found an optimization to the OpenGL Program Pipeline implementation. It should get rolled into a release soon.
This was in Octoboer. Still nothing.
Almost a year later, I managed to find out even another testcase.
Aforementioned AMD systems can only get ~3, ~20 and ~60 fps in each of the tests respectively (basically no matter the GPU)
The nvidia smartphone-sized PC can reach 6 (7 with multi-thread switch), 45 and 105 fps instead.
I believe I might know a source of the dramatic loss of performance compared to the competitor: Scroll to "Threaded Validation and Submission": OpenGL like Vulkan The Mesa driver on Linux attempts to do the same (spawn a thread dedicated to draw calls) and it also has about 30% higher performance under certain conditions.
Another reason is that even without that feature, NVIDIA is faster compared to AMD at OpenGL rendering.
The issue might be more apparent on renderers that are capping their CPU thread.
EDIT: I no longer believe that's the main contributor, see below.
The multithreading feature of other drivers appears to NOT be the main contributor of their better performance. Even if I turn that feature off on the Mesa driver, the performance of that open source driver remains about 30 to 40% better on renderers that are CPU hungry.
I know something similar is true on the NVIDIA driver on Windows if their threading optimization feature is turned off and confirmed there is no much CPU activity beyond the main renderer. Maybe AMD software has a simple design flaw that keeps it back.
Open driver might not be all those bells and whistles either, loosing even against the hated fglrx (in CPU-bound cases, but still that's quite much to say considering elsewhere it's way faster).
EDIT: that's due to a like 25% performance regression in the last months. Unsure about comparisons made with a fixed version.