Archives Discussions

patrickchew1234 · ‎05-16-2017

I am writing a program which is multithreaded and working with multiGPU.

This is a WINDOWED application (meaning DWM is active)

Basically it's one thread per GPU, and it roughly goes like this

for every thread(GPU)

while (vsyncTrigger)

for all windows {

do OpenGLContextSwitch

do OpenGL operations ( )

FlushGL()

SwapBuffers()

}

When I have like 20+ windows spreaded across all GPUs, GPUView produces a very nice graph where the FIFO's are stacked maybe 3-4 levels high like

As the number of windows gets higher to like 32 windows spreaded across 4 GPUs, the FIFO eventually drops to a single level high , and you see some gaps between windows blits per GPU. At this point, the GPUs are still operating in parallel.

GPU0 xxxxxxx xxxx xxxxx xxxxxx xxx

GPU1 xxxxx xxx xxxxxx xxxxx xxxxxx

GPU2 xxx xxxxxxx xxxxxx xxxxx xxxx

GPU3 xxxxxx xxxxxxx xxxxx xxxx xxx

Now comes the issue. When this number of windows gets to like 50+ closer to 60, swapbuffer is suddenly stalling significantly, and parallism seems lost. The amount of "gaps" between each blits are huge, and it's almost as if things start to look serialized.

GPU0 xxxxxxx xxxx xxxxx

GPU1 xxxxx xxx xxxxxx

GPU2 xxx xxxxx

GPU3 xxxxxx xxxxxxx

If I place a CritSection per thead, then I get

xxxxxxxxxxxxxxxxxxxxxx

GPU0 xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

GPU1 xxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

GPU2 xxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxx

GPU3 xxxxxxxxxxxxxxxxxxxxxxxxx

All the blits per GPU are tightly packed (with 2-4 levels high without gaps) but serialized across GPUs by the critical section which is expected.

If I move the SwapBuffers earlier, away from the OpenGL-Draw, I see the issue happening at the SwapBuffer, and the OpenGL draws are packed closely together in GPUView

for every thread(GPU)

while (vsyncTrigger)

SwapBuffers() <<<<<<<<<<

for all windows {

do OpenGLContextSwitch

do OpenGL operations ( )

FlushGL()

}

So the issue "seems" to the amount of SwapBuffers ?? and or DWM ?

Also if I put some Sleep after certain amount of blits it does helped performance a bit,

Size doesn't matter. I can do the test with 50+ tiny windows and the problem still occurs.

The "amount" of blits/swapbuffers/windows matter. As this increases, the FIFO depth decreases, and the gap between blit increases.

Any clues/ideas on what I can try in order to fix this behavior ? Maybe some settings/calls I should make to flush something.

This is running on Windowed mode, so I guess the SwapBuffers do copy the Window buffer to some DWM composition buffer or something.

- Patrick

dwitczak · ‎06-27-2017

Thanks for your question. I'll need to check with the GL driver team. Will get back to you once I hear back.

Archives Discussions

wierd stall in swapbuffers (multiGPU)