cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

nowhere-01
Journeyman III

[OpenGL] occlusion query extremely slow, conditional rendering broken

Occlusion Query Test conditions:

Windows 7 x64,

Catalyst 13.1,

Gigabyte HD 6670(bought especially for testing),

Occlusion query FBO size 256x256,

About 10-12 objects of different size and distance passing frustum test to be rendered for occlusion test,

Every object about 2-3k triangles,

Rendering with glDrawElements, no deprecated stuff.

Result:

Putting glFinish and timer after rendering to occlusion query says occlusion query is ready in 25 ms average. it seems heavily fillrate-limited, because rendering bounding boxes doesn't help. and about 50 small billboards rendered for lens-flare occlusion take 1-2 ms to finish. It's terrible, but it get's worse. if i don't force occlusion query to finish and check if it's ready on the next frame like that:

loop() {

checkOcclusionQuery(); //get occlusion query results from previous frame if available

renderToOcclusionQuery();

renderTheScene();

       };

this loop runs at 30 fps, so every frame takes 33ms to finish. enough for our query, right? but it's not ready by next frames. in fact, it takes 3-4 frames 33 ms each until occlusion query is ready. 105 ms in average. it's ridiculous. 25 ms are unacceptable even for 10 times objects i render. 105 ms marks it as broken part of functionality. it seems like occlusion query result is getting delayed by other commands pushed to pipeline, which makes it useless.

Conditional Rendering:

I've seen it's already have been reported, but described poorly. A bit of detail: conditional rendering always passes if you use it with GL_NO_WAIT. and it stalls GPU up to the point it crashes driver if you use GL_WAIT. Forcefully waiting for occlusion query to finish before using conditional rendering, doesn't make it work. It's completely broken.

P.S.: As a developer, i'm really pissed off with your current implementation of OpenGL, since i've dealed with AMD cards 3 years ago, you didn't improve a lot. Your driver's are still full of critical issues, your employees never visit OpenGL.org forums anymore, and i have no desire to support your hardware.

0 Likes
10 Replies
gsellers
Staff

Hi,

nowhere-01 wrote:

this loop runs at 30 fps, so every frame takes 33ms to finish. enough for our query, right? but it's not ready by next frames. in fact, it takes 3-4 frames 33 ms each until occlusion query is ready.

That is actually expected behavior. When the driver submits a command buffer to the operating system, it sits in a software queue that's at least a frame deep, usually more. Also, once the command buffer hits the hardware, there's generally one in flight and one in the hardware queue. Depending on what your application does, it can easily add up to 3-4 frames of latency.

nowhere-01 wrote:

Conditional Rendering:

I've seen it's already have been reported, but described poorly. A bit of detail: conditional rendering always passes if you use it with GL_NO_WAIT. and it stalls GPU up to the point it crashes driver if you use GL_WAIT. Forcefully waiting for occlusion query to finish before using conditional rendering, doesn't make it work. It's completely broken.

Yes, we have seen that report and are working on a fix. We'll get it into a driver ASAP.

nowhere-01 wrote:

P.S.: As a developer, i'm really pissed off with your current implementation of OpenGL, since i've dealed with AMD cards 3 years ago, you didn't improve a lot. Your driver's are still full of critical issues, your employees never visit OpenGL.org forums anymore, and i have no desire to support your hardware.

I'm very sorry to hear that. We do read opengl.org forums and our employees do post there (just not always identifying themselves as AMD employees). However, I assure you that if you post here about an AMD OpenGL specific issue, you will get a response from me.

Cheers,

Graham

0 Likes

do you think that initial 25 ms are acceptable for a light test-scene? you didn't comment that part directly. ok, for a comparison, on GTX 560Ti same occlusion query finishes in less, than 1 ms, you can't measure it with clock(). and the thing is, occlusion query is used that way: you render to occlusion, you render the rest of the stuff and then you ask for results. and on any modern GPU with sane amount of objects you may be sure it's gonna finish in one frame. 6670 should be close to GTX 260. but it performs more like GeForce 8400. it's about 5-7 times slower than GTX 560Ti and occluion query worked better on FX series cards. not to say, i render to occlusion query only objects visible in frustum and not taking a lot of screen(somewhat distant). now, for amd GPU's i am forced to put glFinish at the end of current frame to make sure occlusion query will be ready. and i have plans to just turn it off if timeouts are too big for several frames. because there's no point in it as optimisation tool on your hardware now. even if it's twice as fast on your higher-end GPU's, it's still ridiculous. you should fix it.

0 Likes

nowhere-01 wrote:

do you think that initial 25 ms are acceptable for a light test-scene?

Without knowing what the frame does, it's hard to comment. However, for a simple scene it does seem low. There may well be some kind of performance issue here, but that is somewhat orthogonal to the number of frames in flight.

nowhere-01 wrote:

the thing is, occlusion query is used that way: you render to occlusion, you render the rest of the stuff and then you ask for results. and on any modern GPU with sane amount of objects you may be sure it's gonna finish in one frame.

That's actually not really true. For example, see this chapter in GPU Gems 2 (which is available on the NVIDIA website): http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html. In particular:

GPU Gems 2, Chapter 6:

This method works well if the tested object is really complex, but step 5 involves waiting until the result of the query actually becomes available. Since, for example, Direct3D allows a graphics driver to buffer up to three frames of rendering commands, waiting for a query results in potentially large delays.

nowhere-01 wrote:

now, for amd GPU's i am forced to put glFinish at the end of current frame to make sure occlusion query will be ready.

That would be really bad for performance and would mean that your application doesn't work well on your customers' machines, even if we improve occlusion query response times. That really is the worst case scenario.

nowhere-01 wrote:

you should fix it.

We're on it... step (1) is knowing there's a problem - thanks for reporting it. Is there any chance you could share your application. I'd love to see how much faster we can get it to go.

Cheers,

Graham

0 Likes

you are referring to article, which is 7+ years old. do you compete with 7 years old GPU's? i've given you example of how occlusion query works on modern middle-end gpu, it's delay is neglectable. noone would use it as an optimisation tool if it really had 3-4 frames delay on majority of hardware. have you seen objects popping up in 3-4 frames? i've watched it for hours trying to make occlusion query work properly on AMD card. it's unacceptable. on on any graphics programming forum you always see that algorithm mentioned: you render to occlusion at the beginning of the frame, you render rest of the passes, you get occlusion query results. that's how thigns are done now, not 7 years ago.

i am aware, that putting glFinish is the worst thing. i would never put it to release, but the funny thing is with glFinish, it feels better with 6670, it's much less sluggish(it barely can do it to 30 fps, when GTX 260 does 70-80), and occlusion query doesn't lag.

i'll try to put something like a small demo of spheres occluding each other with glut and stuff. but i never did it and i'm quite busy. so you should do some research too while i'm doing it. i can provide you actual code of occlusion pass(FBO parameters, opengl states) and how i actually render objects, it may be sufficient. should i send it in PM?

0 Likes

Hi,

nowhere-01 wrote:

you are referring to article, which is 7+ years old. do you compete with 7 years old GPU's? i've given you example of how occlusion query works on modern middle-end gpu, it's delay is neglectable.

Actually, things work today pretty much as they did 7+ years ago. GPU pipelines are really, really deep - actually deeper than they were a few years ago. By definition, getting occlusion query results from rendering involves waiting for that rendering to reach the end of the pipeline and waiting for them will cause a bubble. This is the reason for the GL_QUERY_RESULT_AVAILABLE query and for the existence of conditional rendering in the first place - to make that bubble smaller.

For a good overview of how command buffers (and potentially frames) are stacked up by the OS and graphics hardware, take a look at GPUView: http://graphics.stanford.edu/~mdfisher/GPUView.html. You can even download it (http://msdn.microsoft.com/en-us/performance/cc825801.aspx) and run your application to see how its behaving.

nowhere-01 wrote:

i'll try to put something like a small demo of spheres occluding each other with glut and stuff. but i never did it and i'm quite busy. so you should do some research too while i'm doing it. i can provide you actual code of occlusion pass(FBO parameters, opengl states) and how i actually render objects, it may be sufficient. should i send it in PM?

That would be really helpful, yes. You can email me directly at graham <dot> sellers <at> amd <dot> com.

Cheers,

Graham

0 Likes

sent you some code

0 Likes

it's been a week since i reported this issue and supplied you code. you've never contacted me anymore and didn't respond in this thread anymore. are you planning to fix it?

0 Likes

Hi,

Yes, we've been working on it. Here is what we have found:

  • We are unable to reproduce any hang using conditional rendering.
  • We have discovered that under certain circumstances, the driver does indeed ignore conditional rendering commands and always considers the occlusion query to have a non-zero result.

We have fixed the issue where the driver ignores conditional rendering commands and will get that fix into a driver ASAP. With luck, it will also resolve the hang issue you are experiencing although we cannot validate that as we have not been able to reproduce the problem.

Thanks,

Graham

0 Likes

so it's been more than a month since this issue was reported. has it been fixed in beta drivers? how soon we should expect a stable driver with this fix(and some other opengl issues fixed)?

0 Likes
nowhere-01
Journeyman III

Just tested Catalyst 13.4 on XP with conditional rendering... and it doesn't work. All the same, it always passes with GL_QUERY_NO_WAIT . So you didn't fix it again.

0 Likes