cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

realhet
Miniboss

GL_QUADS: too slow performance vs. GL_TRIANGLE_STRIP

Hi,

I ran into a weird performance issue, with GL_QUADS, and I'm looking for suggestions on how to do it better.

The problem: I have to draw very long curved tubes. Those tubes have millions of 'control points', so it is pretty obvious that the bottleneck is that hardware element which produces the fragments from the visible triangles.

Version1: The tube is drawn with 2 long GL_TRIANGLE_STRIP-s (actually it's only one, but separated with NULL triangles). The tube is approximated with a 4 sided polygon and the smoothing is done with a phong shader, so it looks like a cylindrical object from the distance.

Version2: The tube is built up with GL_QUADS. Each tube segment is represented by a quad that connect 2 adjacent control points with a 3D ray-traced capsule (cylinder+2 half spheres).

The performance difference is 150%: Version 1 is the faster one. I've tried to make Version2 faster by removing the fragment shader (discard every pixel) and exporting very easy coordinates from the vertex shader but it's still 150% slower.

So I guess the problem would be with GL_QUADS: Even the vertex count is 2x much while using version1, but the performance is 1.5x faster o.O. (Ver1: 4 stripes, 8 vertices at each control point (half of the triangles are visible). Ver2: 1 quad, 4 vertex (all visible).)

I guess that the problem could be that with quads. There are not much shared edges between adjacent triangles, and the hardware can't go as fast as with 2 long stripes.

Is there a solution for this? I'm reading bad things about GL_QUADS on the internet, but no solutions there. Maybe a geometry shader is better for this?

(Same results on Evergreen and GCN architectures. HD6970  & HD7770)

Thanks in advance!

0 Likes
5 Replies
nou
Exemplar

Don't use GL_QUADS. They are not supported in modern OpenGL. Geometry shader would add just another unnecessary overhead.

I've tried with GL_TRIANGLE_STRIP. Used an incrementing value to distinguish hidden 'quads' and discard them in the fragment shader. But the performance was the same as GL_QUADS.

Then I finally found what was wrong: I used 1component GL_UNSIGNED_SHORT array as vertex buffer. And the non 4 aligned thing did that penalty. So I replaced it with GL_FLOAT (wasn't in the mood to play with GL_INT either, haha) and it worked, Got a massive 2x speedup. So at the end, version 2 became 1.5x faster than version 1 which is nice.

This "GL_QUAD is deprecated story" is maybe based on OpenGL ES, where everything were chopped down with a 2 handed waraxe. On Evergreen and GCN it works very well. Maybe on NVidia it is bad (as I see in other forums), so I better not use QUADS as you mentioned.

0 Likes

I assume you are using glBegin(GL_QUADS) and glEnd(), as GL_QUADS does not exist for glDrawArrays, glDrawElements, etc. What nou meant is you shouldn't be using glBegin and glEnd, as this is the old slow path of submitting data to the GPU.

This "GL_QUAD is deprecated story" is maybe based on OpenGL ES, where everything were chopped down with a 2 handed waraxe. On Evergreen and GCN it works very well. Maybe on NVidia it is bad (as I see in other forums), so I better not use QUADS as you mentioned.

GL_QUAD IS actually deprecated for OpenGL as well and only works in compatibility mode where all the old dusty and rusty stuff is included. Within a core context glBegin and glEnd won't work. If you want more speedup, use glDrawArrays or glDrawElements (in compatibility or core).

0 Likes

Hi,

As I've found out, the problem was with GL_UNSIGNED_SHORT used as a VertexBuffer type. It makes the draw speed 50% compared to the case when using the classic GL_FLOAT. This might be the "element size must be 4 byte aligned" problem.

I don't use glBegin/glEnd. Those are slow as well, I know.

I also found out that GL_QUADS is work as good as GL_TRIANGLE_STRIP. It does not worth to use GL_TRIANGLE_STRIP with extra degenerate triangles just to avoid GL_QUADS. For the same purpose; GL_TRIANGLES is slower as well. I think GL_QUADS is an important feature: You can draw a forest of trees with it. Also point sprites are made out of quads. It also uses 33% less resources than triangles. I have to work with tens of millions of quads (can't optimize them out, it's a machine toolpath), so that rasterizer thing is the bottleneck, so the effect of the chosen primitive can be seen easily.

(Tested on HD6990 and HD7770, Cat 15.7, win7 64bit)

0 Likes

Just an update on the topic I found out wasting an hour of my life:

OpenGL ES 2.0 (ANGLE 2.1.99f075dade7c) DOESN'T support GL_QUADS.

(The AMD OpenGL implementation supports it perfectly and with great speed)

I use Angle for compatibility reasons only. I'll have to emulate GL_QUADS with additional invisible triangles (using GL_TRIANGLE_STRIP) as I dont want to queue 1000s of draw calls, that would be so slow. (This time it is for drawing text on the screen, that's why I needed quads.)

0 Likes