1 of 1 people found this helpful
Please find the following facts regarding the VGPR in GCN architecture:
- Each VGPR is 32-bit wide [adjacent VGPRs are combined to process 64-bit or 128-bit data]
- Total 256KB or 65536 per Compute Unit (CU)
- 64KB or 16384 per SIMD [as each CU has 4 SIMDs]
- SIMD's private VGPRs (i.e. 64KB) are shared by all the in-flight wavefronts (max. 10) in that SIMD and all the threads (i.e. 64) per wavefront
For example, to support 10 in-flight wavefronts in a SIMD, the max. avg. VGPRs per thread = 16384 / (10 * 64) = 25.6 ~ 24. Now, if the avg. VGPRs per thread is doubled i.e. 48, only half i.e 5 wavefronts can be in-flight in a single SIMD.
So, as per your scenario where each thread access many registers, the actual number of in-flight wavefronts depends on availability of VGPR resource.