From the AMD Accelerated Parallel Processing OpenCL Programming Guide, section 6.2, page 6-10:
The LDS contains 32-banks, each bank is four bytes wide and 256 bytes deep; the bank address is determined by bits 6:2 in the address. |
and:
Bank conflicts are determined by what addresses are accessed on each half wavefront boundary. Threads 0 through 31 are checked for conflicts as are threads 32 through 63 within a wavefront. |
This would imply the lane for each bank is 4-bytes wide, meaning the optimal access pattern would be each thread accesses a consecutive uint.
So far so good, but then this comes up:
Ensure, as much as possible, that the memory requests generated from a quarter-wavefront avoid bank conflicts by using unique address bits 6:2. A simple sequential address pattern, where each work-item reads a float2 value from LDS, generates a conflict-free access pattern on the AMD Radeon HD 7XXX GPU. |
This contradicts the first quote as each half-wavefront accesses addresses with the same 6:2 bits twice which according to the first quote should cause bank conflicts.
Which is it? Do sequential uint2s cause bank-conflicts? Or is it that while the first two quotes are technically accurate, it would be better to say "Threads 0 through 15, 16 though 31, 32 through 47, and 48 through 63 are checked for conflicts within a wavefront" since each wavefront is executed in quarter-wave front units?