AnsweredAssumed Answered

GCN LDS Bank Optimization 4-byte vs 8-byte Memory Access Patterns

Question asked by optimiz3 on Feb 5, 2016
Latest reply on Feb 7, 2016 by realhet

From the AMD Accelerated Parallel Processing OpenCL Programming Guide, section 6.2, page 6-10:

 

The LDS contains 32-banks, each bank is four bytes wide and 256 bytes deep; the bank address is determined by bits 6:2 in the address.

 

and:

 

Bank conflicts are determined by what addresses are accessed on each half wavefront boundary. Threads 0 through 31 are checked for conflicts as are threads 32 through 63 within a wavefront.

 

This would imply the lane for each bank is 4-bytes wide, meaning the optimal access pattern would be each thread accesses a consecutive uint.

 

So far so good, but then this comes up:

 

Ensure, as much as possible, that the memory requests generated from a quarter-wavefront avoid bank conflicts by using unique address bits 6:2. A simple sequential address pattern, where each work-item reads a float2 value from LDS, generates a conflict-free access pattern on the AMD Radeon HD 7XXX GPU.

This contradicts the first quote as each half-wavefront accesses addresses with the same 6:2 bits twice which according to the first quote should cause bank conflicts.

 

Which is it? Do sequential uint2s cause bank-conflicts? Or is it that while the first two quotes are technically accurate, it would be better to say "Threads 0 through 15, 16 though 31, 32 through 47, and 48 through 63 are checked for conflicts within a wavefront" since each wavefront is executed in quarter-wave front units?

Outcomes