1 Reply Latest reply on Jul 21, 2014 4:07 AM by dipak

    Register file

    jirmik

      Hi all! How does the register allocation work? Which of the following is right, for each GCN compute unit?

      1) There's single common register file with 65536 registers for all 64 processing elements in all 4 vector units. (So each register can be assigned to any processing element.)

      2) There are 4 register files with 16384 registers each: one common register file dedicated to all 16 processing elements of each vector unit.

      3) There are 64 register files: a dedicated register file with 1024 registers, one for each of 64 processing elements in all 4 vector units.

       

      I'm curious about a scenario with less than 64 work items per work group (e.g. 16 work items per work group), where each work item needs many registers. If there are dedicated parts of register file, it would mean that some registers are not accessible at all (because they are dedicated to some unused processing elements).

       

      I'm mainly interested in GCN devices. Does anyone know the details?

       

      Thanks,

      Martin Jirman

        • Re: Register file
          dipak

          Hi Martin,

          Please find the following facts regarding the VGPR in GCN architecture:

          • Each VGPR is 32-bit wide [adjacent VGPRs are combined to process 64-bit or 128-bit data]
          • Total 256KB or 65536 per Compute Unit (CU)
          • 64KB or 16384 per SIMD [as each CU has 4 SIMDs]
          • SIMD's private VGPRs (i.e. 64KB) are shared by all the in-flight wavefronts (max. 10) in that SIMD and all the threads (i.e. 64) per wavefront

           

          For example, to support 10 in-flight wavefronts in a SIMD, the max. avg. VGPRs per thread = 16384 / (10 * 64) = 25.6 ~ 24. Now, if the avg. VGPRs per thread is doubled i.e. 48, only half i.e 5 wavefronts can be in-flight in a single SIMD.

          So, as per your scenario where each thread access many registers, the actual number of in-flight wavefronts depends on availability of VGPR resource.

           

          Regards,

          1 of 1 people found this helpful