Data exchange via LDS

Discussion created by Gunter on Sep 13, 2010
Latest reply on Sep 13, 2010 by Jawed

I want to exchange data between the threads of a wavefront via LDS, but I'm not sure I get the maximum performance. Unfortunately Stream Profiler is not working on my Platform (Windows 7 64bit).

I have read that the LDS is composed of 32 banks of width 32 bits. A bank cannot process more than one access per clock. So I'm trying to provide a DWORD offset based on thread ID.

But, what is actually precisely happening when my IL kernel executes a lds_store instruction? Will there be four sets of 16 write accesses, with thread ids:

First set: id 15...0

Second set: id 31..16 and so forth? Or is it completely different?

Any help greatly appreciated.