Hi,
Register allocation is the job of compiler. During the compilation, the compiler decides how to allocate these variables to small, finite set of registers with aim to allocate as many variables to register as possible. The compiler tries to map private memory allocations to the pool of GPRs in the GPU. In the event GPRs are not available, private memory is mapped to the “scratch” region, GPRs have some restrictions about which register ports can be read on each cycle; but generally, these are not exposed to the OpenCL programmer.
In GCN devices, there are two types of GPRs: scalar GPRs (SGPRs) and vector GPRs (VGPRs). Each CU has four vector units and one scalar unit and each vector unit has its own SGPR and VGPR pool. There are 512 SGPRs and 256 VGPRs per vector unit. The vector unit handles all vector instructions (any instruction that is handled per thread). And SGPRs are used for scalar instructions: any instruction that is executed once per wavefront, such as a branch, a scalar ALU instruction and constant cache fetches. SGPRs are also used for constants, all buffer/texture definitions, and sampler definitions; some kernel arguments are stored, at least temporarily, in SGPRs.
So if a programmer wants to use scalar registers instead of vector registers for variables, he should try to use scalar instructions which can be coded using branches, computation on constant memory and etc as mentioned above.