I ran into a weird performance issue, with GL_QUADS, and I'm looking for suggestions on how to do it better.
The problem: I have to draw very long curved tubes. Those tubes have millions of 'control points', so it is pretty obvious that the bottleneck is that hardware element which produces the fragments from the visible triangles.
Version1: The tube is drawn with 2 long GL_TRIANGLE_STRIP-s (actually it's only one, but separated with NULL triangles). The tube is approximated with a 4 sided polygon and the smoothing is done with a phong shader, so it looks like a cylindrical object from the distance.
Version2: The tube is built up with GL_QUADS. Each tube segment is represented by a quad that connect 2 adjacent control points with a 3D ray-traced capsule (cylinder+2 half spheres).
The performance difference is 150%: Version 1 is the faster one. I've tried to make Version2 faster by removing the fragment shader (discard every pixel) and exporting very easy coordinates from the vertex shader but it's still 150% slower.
So I guess the problem would be with GL_QUADS: Even the vertex count is 2x much while using version1, but the performance is 1.5x faster o.O. (Ver1: 4 stripes, 8 vertices at each control point (half of the triangles are visible). Ver2: 1 quad, 4 vertex (all visible).)
I guess that the problem could be that with quads. There are not much shared edges between adjacent triangles, and the hardware can't go as fast as with 2 long stripes.
Is there a solution for this? I'm reading bad things about GL_QUADS on the internet, but no solutions there. Maybe a geometry shader is better for this?
(Same results on Evergreen and GCN architectures. HD6970 & HD7770)
Thanks in advance!