4 Replies Latest reply on Jun 10, 2017 8:43 AM by nerdralph

    GCN l1 data cache architecture

    nerdralph

      The GCN docs indicate the L1 data cache bandwidth is 64 bytes/clock, but don't provide any details as to how this is arbitrated between the 4 SIMD units in each CU.  Is it banked so each SIMD can get 16 bytes per clock, or does one SIMD at a time get a full cache line by some arbitration mechanism?

        • Re: GCN l1 data cache architecture
          dipak

          Hi Ralph,

          Here is the suggested response from the relevant team:

          One wavefront is serviced at a time (over some number of clocks), so it’s best if wavefronts fetch one or more entire cachelines to get peak L1$ bandwidth.

           

          Regards,

           

            • Re: GCN l1 data cache architecture
              nerdralph

              Does "some number of clocks", mean a variable number, or is it a fixed number (say 4) and they just aren't being specific?

              If it's variable, does that mean all (up to 64) memory reads that have returned from the L2 will be serviced before another wavefront is serviced?

              In other words, if a wavefront has executed FLAT_LOAD_DWORD, and each of the 64 threads is loading from a different random address in memory, will it take 64 continuous cycles to service that wavefront?