6 Replies Latest reply on Nov 24, 2015 11:10 AM by tzachi.cohen

    GCN LDS size

    ekondis

      As I had been aware each CU in GCN GPUs has 64KB of LDS. However, in the respective ISA manuals a 32KB size is stated per CU (e.g. paragraph 2.3.1/page 2-4 of AMD_Sea_Islands_Instruction_Set_Architecture1.pdf). What's the actual size? Why is there such a confusion?

        • Re: GCN LDS size
          realhet

          Hi,

           

          32KB is the maximum that can be allocated by a single workgroup. If there are at least 2 workgroups queued on a CU then it is possible to use all the physical 64KB of LDS.

            • Re: GCN LDS size
              ekondis

              That is what I was aware of but in the ISA reference it is stated: "Each compute unit has a 32 kB memory space that enables low-latency communication between work-items within a work-group, or the work-items within a wavefront; this is the local data share (LDS)." (paragraph 2.3.1 "Sea Islands ISA reference").

               

              The same is fact is depicted in figure 2.1 where each CU is illustrated as having a 32KB sized LDS (with a label). It doesn't seem to note anywhere that its physical size is 64KB.

                • Re: GCN LDS size
                  realhet

                  Evergreen had 32KB. I guess they just copied the manual and didn't checked.

                   

                  You can see all the presentations and diagrams advertising GCN architecture, and they all about 64K. For example in the GCN white paper they stated: "The LDS capacity in GCN has doubled to 64KB with 16 or 32 banks (depending on the product)."

                   

                  Or if you're still not sure, check Table 5.10LDS_ALLOC in the ISA manual: This is new information, was not copied. LDS base offset is 8bits which addresses 64DWord cells -> this is 64KB total. However based on that manual, the LDS size has 9bits (128KB) which is wrong, because it has only 7 bits.

                    • Re: GCN LDS size
                      ekondis

                      AMD should spend more effort in keeping their documentation up to date.

                       

                      Thanks for your clarification.

                        • Re: GCN LDS size
                          realhet

                          Yea, low level documentation is just not 100% correct. Not many of us reading that, but I'm happy that they are exists at all.

                           

                          So there is an lds_direct input operand for every GCN instruction. (I've never used them, 'though).

                          And it has a 16 bit byte offset in the low part of M0 register. So maybe it could be used for 64 KBytes.

                          But the LDS hardware has an address calculation logic, that doesn't allow you to read/write outside your LDS area. (If you're use 128bit resource constants properly, then there is a 'protection' to the GPU memory as well.)

                          So I think every wavefront queued in a CU can access any place in the LDS mem, it's only the driver which restrict the area those wavefronts can access, and you just can't do anything against it.

                           

                          (Btw there are more interesting things about the LDS than its size: For example if you have a 8KB lookup table that is accessed all the time, then you should do it 75% from the LDS and 25% from the ram (L1 cache), that leads to optimum performance on GCN.)

                  • Re: GCN LDS size
                    tzachi.cohen

                    Each compute unit has 64KB of LDS, however, at most only 32KB can be allocated to a single wavefront (or workgroup for that matter).

                    I.e. you can utilize the full 64KB of LDS as long as you have more than two wavefronts executing concurrently on the same CU.

                     

                    Tzachi