I would recommend not using multiple OpenGL contexts for this particular use case. Every time you use an object on one shared context that was updated in the other, the driver needs to synchronize the two threads, which can be quite costly.
I would suggest that in your render thread, you create a PBO and map it. Use some form of signal in your application such as an event or semaphore to indicate that the buffer has been mapped and that the pointer to the PBO is valid. Then, in your loader thread, put the data in the mapped PBO. Once the data is in the PBO, the render thread can call glTexSubImage2D to put it into the texture.
If you can, don't use the modified texture immediately. Try to delay use of the texture by a frame or two. This may help the driver prevent stalls between the upload and the use of the texture.
I am going to try that next.
Which contains a lot of performance data, and they suggest what you are, that you just map a pointer (or use pinned memory) and share this to the thread.
I note that they say that AMD GPUs couldn't to memory transfers and rendering simultaneously (at least with the drivers from a year ago). Is that still the case?
One thing reading that chapter has shown me that OpenGL as an API can be very tricky because of all the performance traps, implementation differences, loose specifications etc. I think this is one area where Direct3D does a much better job as it is very 'strict' in design and all drivers are thoroughly tested by Microsoft to all behave exactly the same way.