Cross post in nearby topic.
I've found that on my new Radeon HD 5770 DX SDK sample named OIT11 has really terrible performance (about 12 fps using newest 9.12 drivers). The problem seems to be in the prefix sum pass, where RWBuffer is used. Each ::Dispatch() call drops performance by a factor of 2, so after log(N) passes demo produces 12 fps. Can someone explain me how Mecha demo implements A-buffer and how one can avoid performance drop? Everything I know is that instead of RWBuffer resource Mecha uses RWTexture2D.