cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mirh
Adept III

Dreadful OpenGL performance

I guess this can be considered a kind-of follow up of Abysmal OpenGL performance (RX480)

Basically.. I tried the my testcase on the following systems:

  • Core 2 Duo E8400 + Radeon 7750: ~19 FPS
  • Phenom II X4 965 + Radeon RX 480: ~25 FPS
  • Core 2 Duo 6320 + GeForce GT 430: ~58 FPS

I believe I don't need any further explanation.

My educated guess is that gl commands aren't dispatched to a separate thread.

Then I would have liked to give some more info, but I had problems with both CodeXL​ and  PerfStudio .

Instructions to use the thing shouldn't be any different from those contained in this last link.

29 Replies
dwitczak
Staff

One of our GL driver engineers is looking into this report.

aaronhagan
Staff

Here are a few initial observations.

I ran the application with Crimson 16.9.2 on a windows 10 x64 machine with an RX480 + i7-6700K and was getting around ~54 FPS.
What version of the driver and what OS are you running ?

RX480.jpg

I was also able to capture performance with CodeXL and PerfStudio if ran GSDumpGUI.exe with the following command line arguments.

E:\Work\AMD\pcsx2\bin\plugins\GSdx32-SSE2.dll  E:\Work\AMD\Community\perf-case.7z\gsdx_20160924182111.gs GSReplay -1

The obvious hotspots here included.

GSRendererOGL::DrawPrims

   -> GSDeviceOGL::SetupCB

   -> GSRendererOGL::SetupIA

   -> GSRendererOGL::SendDraw

I will try a few more configurations and let you know what else I find.

Thanks

refractionpcsx2
Adept I

Just to elaborate slightly on this (as I am one of the developers for PCSX2), this performance drop is consistent across the entire AMD range, regardless of computer specs.

OpenGL performance is usually roughly half that seen on DX11 using the same card/setup.

On Nvidia cards the performance of OpenGL vs Dx11 is about the same, sometimes it is 1-2% slower in OpenGL, but generally is the same speed.

So there is certainly an issue with the driver, one of our guys who works with making hardware for a living, also works on GSDX, said the OpenGL driver seems very single threaded, where Nvidia have a multithreaded driver for OpenGL, this wasn't obvious until he enabled the multithreaded support on GSDX when initialising OpenGL, that is when the gap between the card manufacturers appeared.

refractionpcsx2
Adept I

Is there any update on this at all?

mirh
Adept III

Hi Mirh,

One of our developers found an optimization to the OpenGL Program Pipeline implementation. It should get rolled into a release soon.

Thanks,

Aaron Hagan

This was in Octoboer. Still nothing.

Almost a year later, I managed to find out even another testcase.

https://github.com/RippeR37/GL_vs_VK

Aforementioned AMD systems can only get ~3, ~20 and ~60 fps in each of the tests respectively (basically no matter the GPU)

The nvidia smartphone-sized PC can reach 6 (7 with multi-thread switch), 45 and 105 fps instead.

mirh
Adept III

Up given these days engineers seem keen.

epigramx
Adept I

I believe I might know a source of the dramatic loss of performance compared to the competitor: Scroll to "Threaded Validation and Submission": OpenGL like Vulkan​​ The Mesa driver on Linux attempts to do the same (spawn a thread dedicated to draw calls) and it also has about 30% higher performance under certain conditions.

Another reason is that even without that feature, NVIDIA is faster compared to AMD at OpenGL rendering.

The issue might be more apparent on renderers that are capping their CPU thread.

EDIT: I no longer believe that's the main contributor, see below.

0 Likes
epigramx
Adept I

The multithreading feature of other drivers appears to NOT be the main contributor of their better performance. Even if I turn that feature off on the Mesa driver, the performance of that open source driver remains about 30 to 40% better on renderers that are CPU hungry.

I know something similar is true on the NVIDIA driver on Windows if their threading optimization feature is turned off and confirmed there is no much CPU activity beyond the main renderer. Maybe AMD software has a simple design flaw that keeps it back.

0 Likes

Open driver might not be all those bells and whistles either, loosing even against the hated fglrx (in CPU-bound cases, but still that's quite much to say considering elsewhere it's way faster).

EDIT: that's due to a like 25% performance regression in the last months. Unsure about comparisons made with a fixed version.

Ping aaronhagan​ & dwitczak

Even if some native games are CPU bound, I wouldn't call their renderers necessary CPU hungry since it might be game logic being CPU bound. Try an emulator renderer like Citra's or Cemu's which ensures the renderer is on a CPU bound thread and you'll see a significant handicap on the AMD OpenGL driver of Windows compared to Mesa on Linux.

PS. Most native PC games are efficient enough at the system side of rendering to not be low FPS before that condition is met, so people don't even notice. But in those specific cases that the FPS remains low because of that condition, the OpenGL driver for Windows reveals that it's significantly inefficient.

xhuang
Staff

Hello mirh​, using the gl_vs_vk on AMD R9 Fury + latest driver,  i can get the result ~5fps, ~30fps and ~90fps respectively. May I have the latest test result from your side, as well as the GPU/OS/driver info?

pastedImage_0.png

pastedImage_1.png

pastedImage_2.png

0 Likes

I just tested the GL_vs_VK tool on an Nvidia NVS 315 (pretty much a display adapter rather than a graphics card) using Windows 10 and The 391.03 Quadro driver.

These are the results of my test:

Test1: 11FPS

Test2: 61FPS

Test3: 32-53FPS (Fluctuates quite a bit)

Considering I beat the R9 Fury, a card which is vastly more powerful than this thing I'm using in 2 of the 3 tests by over 2x, that is an abysmal showing from the R9 card.

Same results of last time (for as much as the E8400 became a Q9505)

The 7750 is on windows 7x64 with latest 18.8.1

My very broadly educate guess is that you are massively getting cpu-limited.

Hello mirhrefractionpcsx2​, gl_vs_vk has Vulkan support, are you seeing performance gap for VK?

0 Likes

Hi xhuang​, sorry this NVS 315 doesn't support vulkan as far as I can tell so I am unable to test that on this machine.

Just for some additional test data, I had a friend test his AMD card to see what results he gets, they are as follows

GPU: Asus AMD R7 360

OS: Windows 7 64bit

Driver version: 18.8.1

Vulkan:

Test 1: 18 fps

Test 2: 58 fps

Test 3: 148-192 fps

OpenGL:

Test 1: 6 fps

Test 2: 41 fps

Test 3: 150 fps

As for myself, I can try it on my GTX 980Ti tonight to see what kind of performance numbers that gives.

Edit:

Ok tested my 980Ti using the 397.93 drivers (CPU is an i5 4690k @ 4.3Ghz), here's the results, I would expect an R9 Fury to be at least 75% of these results.

OGL

Test 1: 25 FPS

Test 2: 91 FPS

Test 3: 298 FPS

VK

Test 1: 35 FPS

Test 2: 153 FPS

Test 3: 1300-1800 FPS (and a lot of squeeling xD )

Thanks for you information.

We will investigate this issue.  Most likely we will start with gl_vs_vk first. Will update you updated if we find something.

19.1.2 still doesn't show the slightest of improvements (in fact, I think I even lost 10fps in test 3)

Thanks for your patience, we're still working on this.

A couple of months have passed, we are now on 19.7.3 and shows no improvements whatsoever.

Are there any news/updates ?

Thanks for your mention, I will check it soon.

0 Likes

Here is another testcase from our side(pcsx2) showing worst case scenario where the performance drop is massive on amd. Don't know how helpful will this be but I thought I'd post it here anyway.

https://drive.google.com/open?id=1v0TRweNTBdjOLYzBrSB7XD1Cbkoy8quq 

I created a topic with some test cases which might be related to this discussion:https://community.amd.com/message/2924860 

I really hope this gets fixed the current situation with the Windows driver is not that good.

In fact it's a long time work, we need to find out 1. what may effect the performance. 2. Can we fix it without other side effect.

Thanks for your patience!

tapek
Adept III

I do not know it's related but I have noticed a performance problems with some older games in D3D9 related to vertex processing.
Even in quite new game - Final Fantasy XIII - transferring a 358400 bytes vertex buffer kills performance on my old R7 360 and game is doing this in all the frames.

When I forced to change pool in IDirect3DDevice9::CreateVertexBuffer from D3DPOOL_MANAGED to D3DPOOL_SYSTEMMEM so the buffer stays in RAM - 60FPS vs 15FPS (tested in save menu - that vertex buffer contains vertices of menu elements like hand cursor etc).

It is possible that some common code in AMD driver responsible of vertex processing have a performance flaws.

I have a code for my tweaks in a wrapper on github here: GitHub - Nucleoprotein/OneTweakNG: OneTweak for all games with game performance fixes.

Something similar happens also in RE4 (not HD) and King Bounty The Legend - but for that games I change behavior flags in IDirect3D9::CreateDevice to D3DCREATE_MIXED_VERTEXPROCESSING - thats fixes them (>60FPS vs 30-40FPS in KB, 30FPS vs 15FPS in RE4)

tapek
Adept III

Hello again,

I noticed a much improvement in radv:

radv: Align large buffers to the fragment size. - Patchwork

ie. patch to allocate VRAM as power of two, that was added also to drm/amdgpu for linux 4.20, maybe it benefit Windows too, dunno what you currently using in Windows drivers, but this is good point of start looking.

0 Likes

Hello tapek<javascript:;>​, really appreciate your input trying to help us improve our performance. I've already sent the notice to the dev team. Thank you again!

0 Likes
refractionpcsx2
Adept I

Another year, still no progress (at least on the windows side).  Cmon guys it's been 4 years, if you've given up on OpenGL, please just tell us? I get OpenGL is considered "old" now, but it's still widely used in the emulation community at least and is far more functional than DX11, we hate having to recommend that for our users, but we have little choice.  We gave you a great test case (with a midrange GPU of 2018 being shamefully outperformed by a display adapter), surely you could work on something from that?

I look forward to hearing an update.

If you need another example of the performance issues, this was taken from a youtube video (dunno which one, it got passed around a bit) so you can use this as another example of the terrible OGL performance on windows.

DOTA2perfAMD.jpg