This content has been marked as final. Show 2 replies
1) In Brook+ tiling is done on all streams that are not accessed via scatter/gather and is done implicitly by the runtime. A tiling block is 8x8 elements and matches the wavefront.
2) yes this is correct, global buffer is currently done as an uncached read/write.
3) Please see the thread with ID 115872, 'calculating the bottleneck'
4) This has been a longstanding request and we are working with the documentation folks on this.
5) please look at lds_transpose.
For the 1st problem: As you said the tiling operation is done by the runtime. If I access a stream in gather mode and normal mode in a application, how does the runtime deal with this situation?