2 Replies Latest reply on Jan 5, 2011 12:18 PM by aqnuep

    Layered shadow map rendering



      I know that I should first try it not ask it, but I would be interested about the performance characteristics of layered rendering with AMD GPUs.

      Taking the example of having 10 lights that need a shadow map, the classical method to render the shadow maps is the following:

      For each light do the followings:

      1. Get the list of objects that are inside the light's frustum.

      2. Attach the light's depth texture as render target.

      3. Render the determined objects to the shadow map.

      Now, I wonder whether layered rendering would be more efficient:

      1. Get the list of objects that are inside the frustum of any of the lights.

      2. Attach a depth texture array as layered render target.

      3. Render the determined objects using layered rendering: the geometry shader will output every triangle to every layer, which's corresponding light frustum contains the particular triangle.

      What we need in order to do this is using the geometry shader (with instanced geometry shader on GL4.0 class GPU, and with a simple for cycle on earlier generations) having all the frustum matrices available using a uniform buffer or a texture buffer (the former being favorable).

      I would think that using layered rendering could provide much better performance as vertex fetches (and possibly tessellation) is executed only once, also there are fewer state switches and draw commands.

      However, I'm worried about whether geometry shader throughput is an issue here:

      In the HD2000 programming guide you mentioned that only 1:1 and 1:4 input-output ratios use fast-path. This can be an issue as usually this won't be satisfied. Instanced geometry shaders can be of help here, but there are further limitations mentioned in the programming guide stating that using resources like uniform buffers or textures in the geometry shader will disable the fast-path as well.

      Can you give me some hint how this could perform in practice on various hardware generations (HD2000, HD3000, HD4000, HD5000 series) and if there is any possibility to further improve the idea?

        • Layered shadow map rendering


          it is hard to answer this kind of request generically, but I would expect that layered rendering with a GS should be quite efficient on recent hardware.

          one of the main benefit is the much lower CPU overhead since you greatly reduce the number of draw calls (# of layers)

          Pierre B.

            • Layered shadow map rendering

              Yes, actually I am willing to reduce the CPU overhead with layered rendering but not just it.

              As I see, I could also greatly reduce the amount of geometry that needs to be processed:

              Let's have 10 object, each with 10000 triangles and let's have 3 lights.

              Based on the culling, it turns out that:

              - light #1 affects object #1, #3, #4 and #5

              - light #2 affects object #2, #3, #6, #7, #9, #10

              - light #3 affects object #5, #6, #7, #8, #10

              That means with the non-layered solution we have to process a total number of 4+6+5 = 15 objects (15000 triangles).

              With the layered solution, this is reduced to 10 objects (10000 triangles).

              The only question is that the additional overhead caused by using geometry shaders still worths the reduction of the amount of geometry (as with layered rendering each object is drawn maximum once, not for each light, of course, no rasterization costs are saved, but that cannot be avoided anyway).