2 Replies Latest reply on Nov 17, 2012 4:34 PM by ljbade

    Simultaneous texture upload and rendering in OpenGL

    ljbade

      I am currently experimenting to figure out the best way load textures in OpenGL using multiple threads. My final project will be doing a lot of texture streaming so I wanted to figure out the optimal design first.

       

      I have created a simple application that uses two threads, one does some simple rendering in a loop, and the other loads a large texture ~100MB (to make any stalls obvious).

       

      I followed the design recommended by NVIDIA: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0356-GTC2012-Texture-Transfers.pdf

       

      Basically the render thread creates two shared GL contexts and assigns one to each thread.

       

      The upload thread creates a pixel buffer and a texture object, reads the image data directly from disc into the buffer via glMapBufferRange with the write only and invalidate data flags, uses glTexSubImage2D to load the data, and inserts a GPU glFenceSync and the handle is passed back to the render thread.

       

      The render thread runs in a loop and every frame uses glGetsynciv to test if the upload sync has been signalled. If it has not it simply renders another frame. but if it has it then binds the texture and renders with it.

       

      The problem I am having is that glTexSubImage2D always stalls the render thread in SwapBuffers until it finishes. It even does this if the render thread never binds the texture (so there is no rendering dependant on it).

       

      I even tried using the AMD_pinned_memory extension the same way as shown in the example for the extension but it did not change anything.

       

      What is the correct way to perform simultaneous texture loading and rendering on a AMD GPU that doesn't cause any rendering stalls?

      Is there a good example somewhere?

       

      I notice that Rage on my laptop is very smooth and fluid despite loading significant amounts of texture data on the fly so it must be possible.

        • Re: Simultaneous texture upload and rendering in OpenGL
          gsellers

          Hi,

           

          I would recommend not using multiple OpenGL contexts for this particular use case. Every time you use an object on one shared context that was updated in the other, the driver needs to synchronize the two threads, which can be quite costly.

           

          I would suggest that in your render thread, you create a PBO and map it. Use some form of signal in your application such as an event or semaphore to indicate that the buffer has been mapped and that the pointer to the PBO is valid. Then, in your loader thread, put the data in the mapped PBO. Once the data is in the PBO, the render thread can call glTexSubImage2D to put it into the texture.

           

          If you can, don't use the modified texture immediately. Try to delay use of the texture by a frame or two. This may help the driver prevent stalls between the upload and the use of the texture.

           

          Cheers,

           

          Graham

            • Re: Simultaneous texture upload and rendering in OpenGL
              ljbade

              I am going to try that next.

               

              I also found this: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf

               

              Which contains a lot of performance data, and they suggest what you are, that you just map a pointer (or use pinned memory) and share this to the thread.

               

              I note that they say that AMD GPUs couldn't to memory transfers and rendering simultaneously (at least with the drivers from a year ago). Is that still the case?

               

              One thing reading that chapter has shown me that OpenGL as an API can be very tricky because of all the performance traps, implementation differences, loose specifications etc. I think this is one area where Direct3D does a much better job as it is very 'strict' in design and all drivers are thoroughly tested by Microsoft to all behave exactly the same way.