maxdz8

Still confused about LDS

Discussion created by maxdz8 on Jun 17, 2014
Latest reply on Sep 12, 2014 by maxdz8

From the AMD App programming guide, from chapter 5.2, Local Memory (LDS) Optimizations, I read:

Bank conflicts are determined by what addresses are accessed on each half wavefront boundary. Threads 0 through 31 are checked for conflicts as are threads 32 through 63 within a wavefront.

 

In a single cycle, local memory can service a request for each bank

...

The LDS hardware examines the requests generated over two cycles (32 work-items of execution) for bank conflicts. Ensure, as much as possible, that the memory requests generated from a quarter-wavefront avoid bank conflicts by using unique address bits 6:2

The example about the 64bit access pattern seems to agree on that being an optimal access pattern. Yet 64 bit per-WI are 2 banks then there will be 16 WI, which is exactly one clock of work for the SIMD lane but only one quarter of a wavefront. The document is very clear is warning about quarter-wavefront conflicts but why is it using this wording if the conflicts are generated on 0-31, 32-63?

I have issues understanding the half-wavefront conflict thing. If a bank pulls a request each clock why are conflicts among 0-31, 32-63 instead of 0-15,16-31,32-47,48-64?

I suppose this would make sense if LDS had a 1-clock latency but it doesn't seem to be the case from what I read.

 

Can you explain me what is this?

 

I have a kernel which has surprisingly generated almost 12% bank stall so I guess it's time for me to understand LDS completely.

Outcomes