7 Replies Latest reply on Feb 21, 2011 10:13 PM by Cric

    CS5: using the fast memory path with RWTexture2D

    Cric

      Hi,

      I'm manipulating a texture resource ('RWTexture2D'; not declared 'globallycoherent') using a compute shader on a Radeon HD 5770, and in doing so, I'm having a bandwidth bottleneck. The shader is using CompletePath.

      This is not necessary as every texel is read from and written to only once. Reads and writes between two thread groups never overlap and threads never read from the texture after it has been written. (This is, however, hard for a compiler or driver to figure out as the R/W is heavily scattered.)

      What can I do to make the driver choose FastPath?

      What I already figured out:

      • The driver will choose FastPath for reading and writing if I use a 'RWStructuredBuffer' resource instead of 'RWTexture2D'. The overall performance, however, degrades — there are other shaders which require a texture memory layout and perform awful with buffer memory layout; furthermore, a 2-dimensional memory layout simplifies the shader code.
      • The driver will always choose FastPath for read-only textures ('Texture2D'). So, if I allocate a second texture, create a shader resource view from the original texture, bind it as 'Texture2D' and write to the new texture (bound as 'RWTexture2D') instead of the original one, the driver will choose FastPath for reading. Although writing is still done via CompletePath, this comes with a great performance benefit — but I'm short on VRAM and I just cannot afford to double the amount of required memory.
      • Adding the 'globallycoherent' storage class does not change the shader's assembly code, so the driver somehow treats it as globallycoherent all the time.
      So: Why does the driver choose CompletePath for my write operations when the data is declared as 'RWTexture2D', but FastPath when it is declared as 'RWStructuredBuffer'? How can I benefit from FastPath with a 'RWTexture2D'?
      I'd be glad if anyone could provide some information on how the driver decides whether to choose FastPath or CompletePath when compiling a shader and how I can affect this decision through HLSL.
      (The OpenCL Programming Guide only says the compiler was "conservative" and used CompletePath if there were atomic instructions in the shader. I don't use any.)