I am writing a program which is multithreaded and working with multiGPU.
This is a WINDOWED application (meaning DWM is active)
Basically it's one thread per GPU, and it roughly goes like this
for every thread(GPU)
while (vsyncTrigger)
for all windows {
do OpenGLContextSwitch
do OpenGL operations ( )
FlushGL()
SwapBuffers()
}
}
When I have like 20+ windows spreaded across all GPUs, GPUView produces a very nice graph where the FIFO's are stacked maybe 3-4 levels high like
As the number of windows gets higher to like 32 windows spreaded across 4 GPUs, the FIFO eventually drops to a single level high , and you see some gaps between windows blits per GPU. At this point, the GPUs are still operating in parallel.
GPU0 xxxxxxx xxxx xxxxx xxxxxx xxx
GPU1 xxxxx xxx xxxxxx xxxxx xxxxxx
GPU2 xxx xxxxxxx xxxxxx xxxxx xxxx
GPU3 xxxxxx xxxxxxx xxxxx xxxx xxx
Now comes the issue. When this number of windows gets to like 50+ closer to 60, swapbuffer is suddenly stalling significantly, and parallism seems lost. The amount of "gaps" between each blits are huge, and it's almost as if things start to look serialized.
GPU0 xxxxxxx xxxx xxxxx
GPU1 xxxxx xxx xxxxxx
GPU2 xxx xxxxx
GPU3 xxxxxx xxxxxxx
If I place a CritSection per thead, then I get
xxxxxxxxxxxxxxxxxxxxxx
GPU0 xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx
GPU1 xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx
GPU2 xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx
GPU3 xxxxxxxxxxxxxxxxxxxxxxxxx
All the blits per GPU are tightly packed (with 2-4 levels high without gaps) but serialized across GPUs by the critical section which is expected.
If I move the SwapBuffers earlier, away from the OpenGL-Draw, I see the issue happening at the SwapBuffer, and the OpenGL draws are packed closely together in GPUView
for every thread(GPU)
while (vsyncTrigger)
SwapBuffers() <<<<<<<<<<
for all windows {
do OpenGLContextSwitch
do OpenGL operations ( )
FlushGL()
}
}
So the issue "seems" to the amount of SwapBuffers ?? and or DWM ?
Also if I put some Sleep after certain amount of blits it does helped performance a bit,
Size doesn't matter. I can do the test with 50+ tiny windows and the problem still occurs.
The "amount" of blits/swapbuffers/windows matter. As this increases, the FIFO depth decreases, and the gap between blit increases.
Any clues/ideas on what I can try in order to fix this behavior ? Maybe some settings/calls I should make to flush something.
This is running on Windowed mode, so I guess the SwapBuffers do copy the Window buffer to some DWM composition buffer or something.
- Patrick
Thanks for your question. I'll need to check with the GL driver team. Will get back to you once I hear back.