0 Replies Latest reply on May 16, 2017 2:45 PM by patrickchew1234

    wierd stall in swapbuffers (multiGPU)

    patrickchew1234

      I am writing a program which is multithreaded and working with multiGPU.

      This is a WINDOWED application (meaning DWM is active)

      Basically it's one thread per GPU, and it roughly goes like this

       

      for every thread(GPU)

         while (vsyncTrigger)

               for all windows {

                   do OpenGLContextSwitch

                   do OpenGL operations ( )

                   FlushGL()

                   SwapBuffers()

              }

      }

       

      When I have like 20+ windows spreaded across all GPUs, GPUView produces a very nice graph where the  FIFO's are stacked maybe 3-4 levels high like

       

       

      As the number of windows gets higher to like 32 windows spreaded across 4 GPUs, the FIFO eventually drops to a single level high , and you see some gaps between windows blits per GPU. At this point, the GPUs are still operating in parallel.

       

      GPU0      xxxxxxx  xxxx  xxxxx  xxxxxx  xxx

      GPU1         xxxxx  xxx   xxxxxx  xxxxx  xxxxxx

      GPU2         xxx  xxxxxxx  xxxxxx  xxxxx  xxxx

      GPU3            xxxxxx  xxxxxxx  xxxxx  xxxx  xxx

       

      Now comes the issue. When this number of windows gets to like 50+ closer to 60, swapbuffer is suddenly stalling significantly, and parallism seems lost. The amount of "gaps" between each blits are huge, and it's almost as if things start to look serialized.

       

      GPU0            xxxxxxx                                      xxxx                                                         xxxxx              

      GPU1                           xxxxx                                 xxx                                        xxxxxx                      

      GPU2                                        xxx                                         xxxxx                                                          

      GPU3                xxxxxx                                                   xxxxxxx   

       

      If I place a CritSection per thead, then I get

       

                        xxxxxxxxxxxxxxxxxxxxxx          

      GPU0      xxxxxxxxxxxxxxxxxxxxxxxxx

                                                                  xxxxxxxxxxxxxxxxxxxxxx

      GPU1                                                   xxxxxxxxxxxxxxxxxxxxxxxxx

                                                                                                                   xxxxxxxxxxxxxxxxxxxxxx

      GPU2                                                                                                     xxxxxxxxxxxxxxxxxxxxxxxx

                                                                                                                                                                    xxxxxxxxxxxxxxxxxxxxxx

      GPU3                                                                                                                                                    xxxxxxxxxxxxxxxxxxxxxxxxx

       

       

       

      All the blits per GPU are tightly packed (with 2-4 levels high without gaps) but serialized across GPUs by the critical section which is expected.

       

      If I move the SwapBuffers earlier, away from the OpenGL-Draw, I see the issue happening at the SwapBuffer, and the OpenGL draws are packed closely together in GPUView

      for every thread(GPU)

         while (vsyncTrigger)

               SwapBuffers()    <<<<<<<<<<

                for all windows {

                   do OpenGLContextSwitch

                   do OpenGL operations ( )

                   FlushGL()

              }

      }

       

      So the issue "seems" to  the amount of SwapBuffers ??  and or DWM ?

      Also if I put some Sleep after certain amount of blits it does helped performance a bit,

      Size doesn't matter. I can do the test with 50+ tiny windows and the problem still occurs.

      The "amount" of blits/swapbuffers/windows matter. As this increases, the FIFO depth decreases, and the gap between blit increases.

       

      Any clues/ideas on what I can try in order to fix this behavior ? Maybe some settings/calls I should make to flush something.

      This is running on Windowed mode, so I guess the SwapBuffers do copy the Window buffer to some DWM composition buffer or something.

       

      - Patrick