Cross post in nearby topic.
I've found that on my new Radeon HD 5770 DX SDK sample named OIT11 has really terrible performance (about 12 fps using newest 9.12 drivers). The problem seems to be in the prefix sum pass, where RWBuffer is used. Each ::Dispatch() call drops performance by a factor of 2, so after log(N) passes demo produces 12 fps. Can someone explain me how Mecha demo implements A-buffer and how one can avoid performance drop? Everything I know is that instead of RWBuffer resource Mecha uses RWTexture2D.
hi..
i am quite new here
but it seems to very nice post for us...
I wrote a simple implementation of OIT algorithm, it performs in real-time.
Demo can be downloaded by the following link:
same here! I am just a new memeber but I will try to downlod this Joe, what demo is this?
I did some research and found that OIT11 implementation is naive. Written in a way "How not to do it" Efficient prefix sum algorithm (by Blelloch) should be used instead, and thread block should consist of at least 64 threads. Also RWBuffer clearing drops performance significantly with current drivers (although it isn't necessary for OIT).