AnsweredAssumed Answered

Reasons of glDrawElements(BaseVertex) having huge driver overhead?

Question asked by asylum29 on Jul 20, 2017
Latest reply on Aug 11, 2017 by asylum29

I'm currently working on the refactoration of a complex and likely CPU/driver bound renderer.

In a nutshell:

- all rendering is done on a background thread (consumer) which is fed through a blocking queue of "rendering tasks"

- the main thread (producer) does a lot of small batch draw calls like "moveto(a, b) - lineto(c, d) - lineto(e, f) - ... - flush()"; inserting a task into the queue when necessary

- I designed the abstract interface so that it resembles modern graphics APIs (Metal/Vulkan), so I have buffers, textures, renderpasses and graphics pipelines

- the corresponding GL objects are cached, whenever possible (VAOs, programs, framebuffers), so that GL object creations are minimized

- GL state changes are not managed (with a few exceptions like framebuffer, VAO and program changes)

- buffer data uploads are optimized with the GL_MAP_UNSYNCHRONIZED_BIT; should the buffer become full, I do a glFenceSync+glWaitSync (doesn't happen too often tho)


Despite my efforts, so far the rendering is incredibly slow. VTune shows that the majority of the CPU time is spent in glDrawElementBaseVertex:



Same thing on Intel cards. It pretty much looks like a driver limit case, however the funny thing is, that the old DX9 implementation is like 200x faster (hell, even GDI is faster).


So my question is: what might cause a drawing command to have such overhead? Or in other words: what state changes should I look out for to avoid this?


UPDATE: also tried buffer orphaning and STREAM_DRAW effect...