s_k

Awful OpenGL FBO performance when writing to nonzero attachment

Discussion created by s_k on Aug 12, 2019
Latest reply on Oct 2, 2019 by s_k

I'm making a game graphics engine with some screen-space processing (that's important), and figured out that our game runs incredibly slow on AMD gpus.

 

Investigation showed that the problem is in writing to nonzero attachments of a framebuffer objects. I managed to reproduce the problem with a minimal test on RX550, RX590 and RX400 series hardware.

 

I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:

1) Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.

2) Same as 1), but with output and attachment 1.

3) Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.

4) Same as 3, but for attachment 1.

 

I run all tests on two machines with almost similar CPUs and following GPUs:

AMD Radeon RX550, driver version 19.30.01.16

Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550

 

and got these results (column titles are executable names):

 

FillRate_attachment0FillRate_attachment1FillRate_attachment0_maskedFillRate_attachment1_masked
Radeon RX550350 FPS185 FPS330 FPS175 FPS
Geforce GTX 650 Ti195 FPS195 FPS195 FPS235 FPS

 

We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.

 

I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.

 

Explicitly turning off scissor, stencil and depth tests does not help.

 

If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.

 

glClear calls are also affected, and their performance under various conditions fits the above observations.

 

Pre-built test executables are attached to the post or can be downloaded from Google drive: FillrateTest.zip - Google Drive 

Test sources (with MSVS-friendly cmake buildsystem) are available here: https://github.com/sergeyext/FillRate

All four programs show a black window and console with FPS counter.

 

Probably this problem is related to Dreadful OpenGL performance or Abysmal OpenGL performance (RX480) .

 

Upd:

My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests.

I can't tell exact driver version, but it's provided by Ubuntu 19.04 repos.

 

Upd2:

 

I built attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo, build command lines are

emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc
--std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html

Both test programs issue a single drawcall: glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);

 

First test: Firefox with default config, i.e. DirectX-backed ANGLE.

Unmasked Vendor:    Google Inc.
Unmasked Renderer:  ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS

 

Second test: Firefox with disabled ANGLE, (about:config -> webgl.disable-angle = true), using native OpenGL:

Unmasked Vendor:    ATI Technologies Inc. 
Unmasked Renderer:  Radeon RX550/550 Series 
attachment0: 38 FPS
attachment1: 19 FPS

 

We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL.

 

Also started a question on Stackoverflow.

Attachments

Outcomes