Hi,
I know that I should first try it not ask it, but I would be interested about the performance characteristics of layered rendering with AMD GPUs.
Taking the example of having 10 lights that need a shadow map, the classical method to render the shadow maps is the following:
For each light do the followings:
1. Get the list of objects that are inside the light's frustum.
2. Attach the light's depth texture as render target.
3. Render the determined objects to the shadow map.
Now, I wonder whether layered rendering would be more efficient:
1. Get the list of objects that are inside the frustum of any of the lights.
2. Attach a depth texture array as layered render target.
3. Render the determined objects using layered rendering: the geometry shader will output every triangle to every layer, which's corresponding light frustum contains the particular triangle.
What we need in order to do this is using the geometry shader (with instanced geometry shader on GL4.0 class GPU, and with a simple for cycle on earlier generations) having all the frustum matrices available using a uniform buffer or a texture buffer (the former being favorable).
I would think that using layered rendering could provide much better performance as vertex fetches (and possibly tessellation) is executed only once, also there are fewer state switches and draw commands.
However, I'm worried about whether geometry shader throughput is an issue here:
In the HD2000 programming guide you mentioned that only 1:1 and 1:4 input-output ratios use fast-path. This can be an issue as usually this won't be satisfied. Instanced geometry shaders can be of help here, but there are further limitations mentioned in the programming guide stating that using resources like uniform buffers or textures in the geometry shader will disable the fast-path as well.
Can you give me some hint how this could perform in practice on various hardware generations (HD2000, HD3000, HD4000, HD5000 series) and if there is any possibility to further improve the idea?