cancel
Showing results for 
Search instead for 
Did you mean: 

OpenGL & Vulkan

s_k
Adept I

Awful OpenGL FBO performance when writing to nonzero attachment

I'm making a game graphics engine with some screen-space processing (that's important), and figured out that our game runs incredibly slow on AMD gpus.

Investigation showed that the problem is in writing to nonzero attachments of a framebuffer objects. I managed to reproduce the problem with a minimal test on RX550, RX590 and RX400 series hardware.

I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:

1) Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.

2) Same as 1), but with output and attachment 1.

3) Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.

4) Same as 3, but for attachment 1.

I run all tests on two machines with almost similar CPUs and following GPUs:

AMD Radeon RX550, driver version 19.30.01.16

Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550

and got these results (column titles are executable names):

FillRate_attachment0FillRate_attachment1FillRate_attachment0_maskedFillRate_attachment1_masked
Radeon RX550350 FPS185 FPS330 FPS175 FPS
Geforce GTX 650 Ti195 FPS195 FPS195 FPS235 FPS

We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.

I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.

Explicitly turning off scissor, stencil and depth tests does not help.

If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.

glClear calls are also affected, and their performance under various conditions fits the above observations.

Pre-built test executables are attached to the post or can be downloaded from Google drive: FillrateTest.zip - Google Drive 

Test sources (with MSVS-friendly cmake buildsystem) are available here: https://github.com/sergeyext/FillRate

All four programs show a black window and console with FPS counter.

Probably this problem is related to Dreadful OpenGL performance or Abysmal OpenGL performance (RX480) .

Upd:

My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests.

I can't tell exact driver version, but it's provided by Ubuntu 19.04 repos.

Upd2:

I built attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo, build command lines are

emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc
--std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html

Both test programs issue a single drawcall: glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);

First test: Firefox with default config, i.e. DirectX-backed ANGLE.

Unmasked Vendor:    Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS

Second test: Firefox with disabled ANGLE, (about:config -> webgl.disable-angle = true), using native OpenGL:

Unmasked Vendor:    ATI Technologies Inc. 
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS

We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL.

Also started a question on Stackoverflow.

0 Likes
Reply
7 Replies
dorisyan
Staff
Staff

Re: Awful OpenGL FBO performance when writing to nonzero attachment

Hi @s_k, Thanks for your report, I will investigate this issue as soon as possible.

0 Likes
Reply
__ian__
Adept I

Re: Awful OpenGL FBO performance when writing to nonzero attachment

Sounds similar to a problem I have. I have an FBO with multiple texture attachments and depth/stencil, and simply drawing it to the backbuffer absolutely nukes performance only on ATI cards.

0 Likes
Reply
dorisyan
Staff
Staff

Re: Awful OpenGL FBO performance when writing to nonzero attachment

I have investigated this issue, you can use texture0 as far as possilbe to avoid this problem, and I will fix it soon.  

0 Likes
Reply
s_k
Adept I

Re: Awful OpenGL FBO performance when writing to nonzero attachment

Thank you very much!

I would be grateful if you let me know when the problem is fixed.

0 Likes
Reply
dorisyan
Staff
Staff

Re: Awful OpenGL FBO performance when writing to nonzero attachment

Hi @s_k, this bug has been fixed on my machine, please refer to the following image.

If everything goes well, it may be released in the next driver. Thanks for your patience.

pastedImage_1.png

pastedImage_2.png

0 Likes
Reply
s_k
Adept I

Re: Awful OpenGL FBO performance when writing to nonzero attachment

Got it. We are really looking forward to the release and greatly appreciate your effort!

0 Likes
Reply
s_k
Adept I

Re: Awful OpenGL FBO performance when writing to nonzero attachment

The problem fix was released in Adrenalin 2020 Edition 19.12.2 Recommended (WHQL). Glory to AMD OpenGL driver team! Thank you very much! Hope this will solve many other people's issues with opengl games.

Btw, I didn't see the fix announcement in Release notes.