Archives Discussions

Cric · ‎01-19-2011

Hi,

I'm manipulating a texture resource ('RWTexture2D'; not declared 'globallycoherent') using a compute shader on a Radeon HD 5770, and in doing so, I'm having a bandwidth bottleneck. The shader is using CompletePath.

This is not necessary as every texel is read from and written to only once. Reads and writes between two thread groups never overlap and threads never read from the texture after it has been written. (This is, however, hard for a compiler or driver to figure out as the R/W is heavily scattered.)

What can I do to make the driver choose FastPath?

What I already figured out:

The driver will choose FastPath for reading and writing if I use a 'RWStructuredBuffer' resource instead of 'RWTexture2D'. The overall performance, however, degrades — there are other shaders which require a texture memory layout and perform awful with buffer memory layout; furthermore, a 2-dimensional memory layout simplifies the shader code.
The driver will always choose FastPath for read-only textures ('Texture2D'). So, if I allocate a second texture, create a shader resource view from the original texture, bind it as 'Texture2D' and write to the new texture (bound as 'RWTexture2D') instead of the original one, the driver will choose FastPath for reading. Although writing is still done via CompletePath, this comes with a great performance benefit — but I'm short on VRAM and I just cannot afford to double the amount of required memory.
Adding the 'globallycoherent' storage class does not change the shader's assembly code, so the driver somehow treats it as globallycoherent all the time.

So: Why does the driver choose CompletePath for my write operations when the data is declared as 'RWTexture2D', but FastPath when it is declared as 'RWStructuredBuffer'? How can I benefit from FastPath with a 'RWTexture2D'?

I'd be glad if anyone could provide some information on how the driver decides whether to choose FastPath or CompletePath when compiling a shader and how I can affect this decision through HLSL.

(The OpenCL Programming Guide only says the compiler was "conservative" and used CompletePath if there were atomic instructions in the shader. I don't use any.)

Cric · ‎01-22-2011

*bump*

Why is RWStructuredBuffer many times faster than RWTexture? What's the reasoning to always choose CompletePath for RWTextures?

Cric · ‎02-01-2011

*bump*

Cric · ‎02-10-2011

*bump*

Why do 'RWTexture2D' and 'RWBuffer' use CompletePath while 'RWStructuredBuffer' uses FastPath?

Shrinker · ‎02-11-2011

You say OpenCL, so this subforum might yield an answer for you: http://forums.amd.com/devforum/categories.cfm?catid=390&entercat=y

Cric · ‎02-11-2011

Originally posted by: Shrinker You say OpenCL, so this subforum might yield an answer for you: http://forums.amd.com/devforum/categories.cfm?catid=390&entercat=y

No, I just mentioned I read the OpenCL programming guide (because there is no DirectCompute programming guide). This is a HLSL- / compute shader- / DirectCompute-only question.

Cric · ‎02-18-2011

*bump*

Cric · ‎02-21-2011

The difference between RWByteAddressBuffer/RWStructuredBuffer (very fast) and RWBuffer/RWTexture (awfully slow) is that Direct3D assumes RWBuffer and RWTexture to be typed: Writing to RWBuffer or RWTexture may require data conversion (e.g. when writing a float to DXGI_FORMAT_R8_UNORM).

One possibility to bypass this (proposed by Microsoft for use in in-place image editing) is using the uint version of these objects — but not even that changes anything.

Obviously, typed accesses cause the shader compiler to always choose CompletePath, even if the shader accesses the resource without overlapping, or as read-only, write-only or through uint.

RWBuffer and RWTexture are so awfully slow on AMD hardware that they're practically unusable for real-time applications. Can someone tell me any rationale for this or should I just file a performance bug?

Archives Discussions

CS5: using the fast memory path with RWTexture2D