I'm creating my graphics resources in another thread using both D3D/GL. D3D one is pretty stable and good, but the GL one is not so good compared to D3D.
Here's a brief description of what I do (on windows 7 SP1) :
I took a similiar (but much easier) approach for D3D, the frame rates are smooth except the first object creation which I have a spike in frame-time.
But in OpenGL performance is inferior, and in the first object creation, I get a HUGE spike.
I've also noticed that even if I don't create objects from the loader context and instead create them normally in the main thread, just creating the second context shared with the first one, concludes to horrible performance through out the whole proceeding frames, so it doesn't seem to matter if I load in another thread or use the worker thread/context.
In the image below, I'm running on 13.9 drivers, OpenGL, and second shared context is also created, check out the frame-time graph which is about 12ms through the whole runtime, not just loading time
In the image below, I've turned off threaded loading and also the creation of the secondary context, check out the improvement:
Today, I've upgraded to 13.12 drivers, check out the improvement of frame-times for threaded object creation:
Compare these to the similiar D3D implementation:
It looks like driver is constantly doing some overhead work to sync shared resources among contexts through out the whole lifetime of application
So is this AMD's driver fault ?! Or I'm doing something wrong with my OpenGL method ?
I wouldn't recommend using "loader threads" with multiple contexts in this manner. You are correct - the driver needs to do quite a bit of extra work when it detects that there are contexts current in multiple threads in order to keep objects in sync. At best, this will add some synchronization overhead between contexts, and at worst it will drop performance on your main thread even when the secondary thread is not active.
If you really want to stream data from another thread, my recommendation would be to use a single context, create the objects in a main thread (textures, buffers, etc.), map buffers and pass the pointers to your worker threads. At this point, the worker threads can fill the buffers with data read from disk, network, procedurally generated, etc., in parallel. When they're done, signal the main thread. Upon receiving this signal, the main thread unmaps the buffer and the data will be ready to use. For textures, simply use the buffer bound to PIXEL_UNPACK_BUFFER to stage the texture data.
Do not use multiple contexts like this. It will work, but it won't give you the performance you're looking for.
Thanks
Unfortunately for me, I have to implement the hard way for OpenGL.
I'm still curious to know what makes Direct3D more robust in this particular case ? and why OpenGL drivers just can't create objects in another thread like d3d11 does it ?
I'm going to go out on a limb and blame the OpenGL specification. Its not fair sometimes to compare OpenGL and Direct3D in certain regards, because D3D implementation have strict requirements and less wiggle room for interpretation.