I'm making a game graphics engine with some screen-space processing (that's important), and figured out that our game runs incredibly slow on AMD gpus.
Investigation showed that the problem is in writing to nonzero attachments of a framebuffer objects. I managed to reproduce the problem with a minimal test on RX550, RX590 and RX400 series hardware.
I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:
1) Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.
2) Same as 1), but with output and attachment 1.
3) Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.
4) Same as 3, but for attachment 1.
I run all tests on two machines with almost similar CPUs and following GPUs:
AMD Radeon RX550, driver version 19.30.01.16
Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550
and got these results (column titles are executable names):
|Radeon RX550||350 FPS||185 FPS||330 FPS||175 FPS|
|Geforce GTX 650 Ti||195 FPS||195 FPS||195 FPS||235 FPS|
We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.
I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.
Explicitly turning off scissor, stencil and depth tests does not help.
If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.
glClear calls are also affected, and their performance under various conditions fits the above observations.
Pre-built test executables are attached to the post or can be downloaded from Google drive: FillrateTest.zip - Google Drive
Test sources (with MSVS-friendly cmake buildsystem) are available here: https://github.com/sergeyext/FillRate
All four programs show a black window and console with FPS counter.
My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests.
I can't tell exact driver version, but it's provided by Ubuntu 19.04 repos.
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
Both test programs issue a single drawcall:
glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
First test: Firefox with default config, i.e. DirectX-backed ANGLE.
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
Second test: Firefox with disabled ANGLE, (
webgl.disable-angle = true), using native OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL.
Also started a question on Stackoverflow.