we are rendering large amounts of point cloud data and ran into an issue with our AMD GPU (R7 370).
While on the Nvidia GPU everything is fine, on the AMD GPU large parts of the point cloud flickers or is black. A closer examination with CodeXL and RenderDoc (awesome tools, by the way) surfaced that the data is correctly loaded in the memory and passed to the vertex shader, but the shader stops passing the colour values at the 4096th colour value in the memory. This means while rendering, the first 4096 vertices will be coloured correctly and everything beyond will flicker or is just black.
RenderDoc also discovers the following two warnings (VERTEX_ATTRIB is the Colour attribute):
glDrawArrays uses input attribute 'VERTEX_ATTRIB' which is specified as 'type = GL_UNSIGNED_BYTE size = 3'; this combination is not a natively supported input attribute type
glDrawArrays uses input attribute 'VERTEX_ATTRIB' with stride '3' that is not optimally aligned; consider aligning on a 4-byte boundary
Neither of this, the warnings or the behaviour is present on the Nvidia GPU (gtx 980).
If we change stride and size to 4, the warnings and the flickering disappear, but increasing the already large data set with an extra byte per colour seems a bit excessive to me, especially as passing as colours as RGB888 shouldn't be too uncommon.
Are we doing something wrong on the OpenGL side and experiencing unspecified behaviour (that works on Nvidia GPUs by chance) or is this some driver issue? It sounds similar to this thread, but this issue was solved in 2012. Has anyone of you encountered something similar recently?
We assembled a small example program, so you can have a look into the code and build it yourself. You find it here.
If you see a flickering white dot, you have the same issue. If it stays stable it works correct.
GL_VENDOR: ATI Technologies Inc.
GL_VERSION: 3.2.13507 Core Profile Forward-Compatible Context 23.20.15027.2002
GL_RENDERER: AMD Radeon (TM) R7 370 Series
Operating System: Win10 Pro x64
In my experience, passing colors as RGB888 is extremely uncommon, because I thought it was impossible!
n.b. NVidia drivers are famous for accepting invalid GL program behavior and doing a fairly good job of tolerating it. I wouldn't be surprised whatsoever if their driver was silently allocating the extra 33% space required and inserting the padding bytes for you!
I haven't double-checked with the actual GL spec documents, but I would've guessed that a 3-byte attribute stride is illegal. Using the D3D API there's actually no way to request a 3 byte color attribute.
If I remember correctly, AMD hardware requires buffer loads to be at least 2-byte aligned, so a 3-byte attribute wouldn't be supported in hardware -- supporting it would require quite complex vertex-shader code that does two 4-byte loads and then swizzles / shifts the bytes out of them... so technically possible, but at a high computational cost.
You could actually implement this yourself -- make a buffer of 32-bit integers, but put your existing data into it e.g. 4 colours in three elements [RGBR][GBRG][BRGB]. Then bind it to your vertex shader as a "buffer texture" (or read-only SSBO), then gl_VertexId to fetch and unpack the data yourself.
Alternatively, are there any other vertex attributes that you could fit into this extra "padding" byte?
Or, would you be better off using 16-bit colors instead of 24-bit? I've had good results using YCoCg color-space, with 6-5-5 bits per channel, manually packed into an integer with bitshifting code
Sorry for the late answer, I was traveling for our institute.
Thank you very much for your input. We guessed that the Nvidia driver does it somewhat different and it may be the way you suggest, but whatever it does, it does not show up in Renderdoc while inspecting the buffers so I can neither confirm or deny that it adds extra padding.
Regarding the gl specification: I found no restrictions in their documents and quite respected tutorials like the OpenGL SuperBible are using RGB888 so I assume it's not strictly illegal. I also asked the guys from the OpenGL Forum and they also found nothing wrong (from their side) with it, though warnings about suboptimal performance from the driver should be expected.
Another thanks for your ellaborate suggestions for solving this. We will have a look what is applicable to our use case. Till now we had the luxury that we could load the data more or less directly from HDD to VRAM in bulk without much processing.
I will keep you posted about any progress we make.
Okay, we found something.
If we change the OpenGL profile from core to compatibility everything works as expected.
No real solution, but probably a reasonable workaround for us and everyone else who experiences this behaviour, till it is fixed.