1 Reply Latest reply on Dec 17, 2012 1:19 PM by shuster

    LDS access patterns


      Hi, I have a question about best practices for local memory accesses. Section 6.2 of the AMD OpenCL programming guide (v2.4), on page 6-17 reads:


      "A simple sequential address pattern, where each work-item reads a float2 value from LDS, generates a conflict-free access pattern on the ATI Radeon™ HD 5870 GPU. Note that a sequential access pattern, where each work-item reads a float4 value from LDS, uses only half the banks on each cycle on the ATI Radeon™ HD 5870 GPU and delivers half the performance of the float access pattern."


      The first sentence makes sense to me. I have two questions about the second:

      1. Am I correct in assuming that the second sentence makes reference to the first and should read "..delivers only half the performance of the float2 access pattern." ?

      2. I understand that a quarter wavefront accessing float4 values will generate bank conflicts. However, (if my above assumption is correct) how does this deliver only half the performance of a float2 access pattern?


      For example, suppose each work-item ultimately needs to access 4 float values. If each reads a float4 straight up, bank conflicts will occur and we'll need 2 cycles (?) to service a quarter wavefront of 16 work-items. However, if each work-item reads a float2, we'll need to use a loop that iterates twice, each time reading a float2 (requiring 1 cycle), yeilding the same (?) total access time of 2 cycles. I'm probably missing something important about how bank conflicts are resolved...


      By the way AMD, your documentation is great, and has been immensely helpful thus far. Thanks!