I'm making a game graphics engine with some screen-space processing (that's important), and figured out that our game runs incredibly slow on AMD gpus.
Investigation showed that the problem is in writing to nonzero attachments of a framebuffer objects. I managed to reproduce the problem with a minimal test on RX550, RX590 and RX400 series hardware.
I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:
1) Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.
2) Same as 1), but with output and attachment 1.
3) Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.
4) Same as 3, but for attachment 1.
I run all tests on two machines with almost similar CPUs and following GPUs:
AMD Radeon RX550, driver version 19.30.01.16
Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550
and got these results (column titles are executable names):
FillRate_attachment0 | FillRate_attachment1 | FillRate_attachment0_masked | FillRate_attachment1_masked | |
---|---|---|---|---|
Radeon RX550 | 350 FPS | 185 FPS | 330 FPS | 175 FPS |
Geforce GTX 650 Ti | 195 FPS | 195 FPS | 195 FPS | 235 FPS |
We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.
I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.
Explicitly turning off scissor, stencil and depth tests does not help.
If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.
glClear calls are also affected, and their performance under various conditions fits the above observations.
Pre-built test executables are attached to the post or can be downloaded from Google drive: FillrateTest.zip - Google Drive
Test sources (with MSVS-friendly cmake buildsystem) are available here: https://github.com/sergeyext/FillRate
All four programs show a black window and console with FPS counter.
Probably this problem is related to Dreadful OpenGL performance or Abysmal OpenGL performance (RX480) .
Upd:
My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests.
I can't tell exact driver version, but it's provided by Ubuntu 19.04 repos.
Upd2:
I built attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo, build command lines are
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
Both test programs issue a single drawcall: glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
First test: Firefox with default config, i.e. DirectX-backed ANGLE.
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
Second test: Firefox with disabled ANGLE, (about:config
-> webgl.disable-angle = true
), using native OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL.
Also started a question on Stackoverflow.
Hi @s_k, Thanks for your report, I will investigate this issue as soon as possible.
Sounds similar to a problem I have. I have an FBO with multiple texture attachments and depth/stencil, and simply drawing it to the backbuffer absolutely nukes performance only on ATI cards.
I have investigated this issue, you can use texture0 as far as possilbe to avoid this problem, and I will fix it soon.
Thank you very much!
I would be grateful if you let me know when the problem is fixed.
Hi @s_k, this bug has been fixed on my machine, please refer to the following image.
If everything goes well, it may be released in the next driver. Thanks for your patience.
Got it. We are really looking forward to the release and greatly appreciate your effort!
The problem fix was released in Adrenalin 2020 Edition 19.12.2 Recommended (WHQL). Glory to AMD OpenGL driver team! Thank you very much! Hope this will solve many other people's issues with opengl games.
Btw, I didn't see the fix announcement in Release notes.