10 Replies Latest reply on Apr 26, 2013 11:36 AM by nowhere-01

    [OpenGL] occlusion query extremely slow, conditional rendering broken

    nowhere-01

      Occlusion Query Test conditions:

      Windows 7 x64,

      Catalyst 13.1,

      Gigabyte HD 6670(bought especially for testing),

      Occlusion query FBO size 256x256,

      About 10-12 objects of different size and distance passing frustum test to be rendered for occlusion test,

      Every object about 2-3k triangles,

      Rendering with glDrawElements, no deprecated stuff.

       

      Result:

      Putting glFinish and timer after rendering to occlusion query says occlusion query is ready in 25 ms average. it seems heavily fillrate-limited, because rendering bounding boxes doesn't help. and about 50 small billboards rendered for lens-flare occlusion take 1-2 ms to finish. It's terrible, but it get's worse. if i don't force occlusion query to finish and check if it's ready on the next frame like that:

       

      loop() {

      checkOcclusionQuery(); //get occlusion query results from previous frame if available

      renderToOcclusionQuery();

      renderTheScene();

             };

       

      this loop runs at 30 fps, so every frame takes 33ms to finish. enough for our query, right? but it's not ready by next frames. in fact, it takes 3-4 frames 33 ms each until occlusion query is ready. 105 ms in average. it's ridiculous. 25 ms are unacceptable even for 10 times objects i render. 105 ms marks it as broken part of functionality. it seems like occlusion query result is getting delayed by other commands pushed to pipeline, which makes it useless.

       

      Conditional Rendering:

      I've seen it's already have been reported, but described poorly. A bit of detail: conditional rendering always passes if you use it with GL_NO_WAIT. and it stalls GPU up to the point it crashes driver if you use GL_WAIT. Forcefully waiting for occlusion query to finish before using conditional rendering, doesn't make it work. It's completely broken.

       

       

      P.S.: As a developer, i'm really pissed off with your current implementation of OpenGL, since i've dealed with AMD cards 3 years ago, you didn't improve a lot. Your driver's are still full of critical issues, your employees never visit OpenGL.org forums anymore, and i have no desire to support your hardware.

        • Re: [OpenGL] occlusion query extremely slow, conditional rendering broken
          gsellers

          Hi,

          nowhere-01 wrote:

           

          this loop runs at 30 fps, so every frame takes 33ms to finish. enough for our query, right? but it's not ready by next frames. in fact, it takes 3-4 frames 33 ms each until occlusion query is ready.

          That is actually expected behavior. When the driver submits a command buffer to the operating system, it sits in a software queue that's at least a frame deep, usually more. Also, once the command buffer hits the hardware, there's generally one in flight and one in the hardware queue. Depending on what your application does, it can easily add up to 3-4 frames of latency.

           

          nowhere-01 wrote:

           

          Conditional Rendering:

          I've seen it's already have been reported, but described poorly. A bit of detail: conditional rendering always passes if you use it with GL_NO_WAIT. and it stalls GPU up to the point it crashes driver if you use GL_WAIT. Forcefully waiting for occlusion query to finish before using conditional rendering, doesn't make it work. It's completely broken.

          Yes, we have seen that report and are working on a fix. We'll get it into a driver ASAP.

          nowhere-01 wrote:

          P.S.: As a developer, i'm really pissed off with your current implementation of OpenGL, since i've dealed with AMD cards 3 years ago, you didn't improve a lot. Your driver's are still full of critical issues, your employees never visit OpenGL.org forums anymore, and i have no desire to support your hardware.

          I'm very sorry to hear that. We do read opengl.org forums and our employees do post there (just not always identifying themselves as AMD employees). However, I assure you that if you post here about an AMD OpenGL specific issue, you will get a response from me.

           

          Cheers,

           

          Graham

            • Re: [OpenGL] occlusion query extremely slow, conditional rendering broken
              nowhere-01

              do you think that initial 25 ms are acceptable for a light test-scene? you didn't comment that part directly. ok, for a comparison, on GTX 560Ti same occlusion query finishes in less, than 1 ms, you can't measure it with clock(). and the thing is, occlusion query is used that way: you render to occlusion, you render the rest of the stuff and then you ask for results. and on any modern GPU with sane amount of objects you may be sure it's gonna finish in one frame. 6670 should be close to GTX 260. but it performs more like GeForce 8400. it's about 5-7 times slower than GTX 560Ti and occluion query worked better on FX series cards. not to say, i render to occlusion query only objects visible in frustum and not taking a lot of screen(somewhat distant). now, for amd GPU's i am forced to put glFinish at the end of current frame to make sure occlusion query will be ready. and i have plans to just turn it off if timeouts are too big for several frames. because there's no point in it as optimisation tool on your hardware now. even if it's twice as fast on your higher-end GPU's, it's still ridiculous. you should fix it.

                • Re: [OpenGL] occlusion query extremely slow, conditional rendering broken
                  gsellers

                  nowhere-01 wrote:

                   

                  do you think that initial 25 ms are acceptable for a light test-scene?

                  Without knowing what the frame does, it's hard to comment. However, for a simple scene it does seem low. There may well be some kind of performance issue here, but that is somewhat orthogonal to the number of frames in flight.

                  nowhere-01 wrote:

                   

                  the thing is, occlusion query is used that way: you render to occlusion, you render the rest of the stuff and then you ask for results. and on any modern GPU with sane amount of objects you may be sure it's gonna finish in one frame.

                  That's actually not really true. For example, see this chapter in GPU Gems 2 (which is available on the NVIDIA website): http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html. In particular:

                  GPU Gems 2, Chapter 6:

                   

                  This method works well if the tested object is really complex, but step 5 involves waiting until the result of the query actually becomes available. Since, for example, Direct3D allows a graphics driver to buffer up to three frames of rendering commands, waiting for a query results in potentially large delays.

                  nowhere-01 wrote:

                   

                  now, for amd GPU's i am forced to put glFinish at the end of current frame to make sure occlusion query will be ready.

                  That would be really bad for performance and would mean that your application doesn't work well on your customers' machines, even if we improve occlusion query response times. That really is the worst case scenario.

                  nowhere-01 wrote:

                   

                  you should fix it.

                  We're on it... step (1) is knowing there's a problem - thanks for reporting it. Is there any chance you could share your application. I'd love to see how much faster we can get it to go.

                   

                  Cheers,

                   

                  Graham

                    • Re: [OpenGL] occlusion query extremely slow, conditional rendering broken
                      nowhere-01

                      you are referring to article, which is 7+ years old. do you compete with 7 years old GPU's? i've given you example of how occlusion query works on modern middle-end gpu, it's delay is neglectable. noone would use it as an optimisation tool if it really had 3-4 frames delay on majority of hardware. have you seen objects popping up in 3-4 frames? i've watched it for hours trying to make occlusion query work properly on AMD card. it's unacceptable. on on any graphics programming forum you always see that algorithm mentioned: you render to occlusion at the beginning of the frame, you render rest of the passes, you get occlusion query results. that's how thigns are done now, not 7 years ago.

                       

                      i am aware, that putting glFinish is the worst thing. i would never put it to release, but the funny thing is with glFinish, it feels better with 6670, it's much less sluggish(it barely can do it to 30 fps, when GTX 260 does 70-80), and occlusion query doesn't lag.

                       

                      i'll try to put something like a small demo of spheres occluding each other with glut and stuff. but i never did it and i'm quite busy. so you should do some research too while i'm doing it. i can provide you actual code of occlusion pass(FBO parameters, opengl states) and how i actually render objects, it may be sufficient. should i send it in PM?

                • Re: [OpenGL] occlusion query extremely slow, conditional rendering broken
                  nowhere-01

                  Just tested Catalyst 13.4 on XP with conditional rendering... and it doesn't work. All the same, it always passes with GL_QUERY_NO_WAIT . So you didn't fix it again.