Hi all! How does the register allocation work? Which of the following is right, for each GCN compute unit?
1) There's single common register file with 65536 registers for all 64 processing elements in all 4 vector units. (So each register can be assigned to any processing element.)
2) There are 4 register files with 16384 registers each: one common register file dedicated to all 16 processing elements of each vector unit.
3) There are 64 register files: a dedicated register file with 1024 registers, one for each of 64 processing elements in all 4 vector units.
I'm curious about a scenario with less than 64 work items per work group (e.g. 16 work items per work group), where each work item needs many registers. If there are dedicated parts of register file, it would mean that some registers are not accessible at all (because they are dedicated to some unused processing elements).
I'm mainly interested in GCN devices. Does anyone know the details?