Currently, there is no compiler option that directly controls the register usage and register allocation. Generally, the compiler tries to optimize the register usage so that more number of wavefronts can be in-flight (which increases the gpu occupancy). Also, without knowing the work-group size, the compiler must assume an upper-bound size to avoid allocating more registers in the work-item than the hardware actually contains.
One way to hint the compiler is specifying a smaller work-group size at compile time (by reqd_work_group_size ) that allows the compiler to allocate more registers for each kernel, which can avoid spill code and improve performance. Please note, it is still a good idea to re-write the algorithm to use fewer registers and avoid allocating a large array in the private memory.
By the way, on GCN devices, the number of active wavefronts per SIMD = 256 / #VGPR used by the kernel [ assuming 4-byte data type].
In the above case, if the array is allocated in the registers, it is likely that the kernel uses more than 128 registers. Thus the wavefront per SIMD is 1 or occupancy is 10% only.
Thanks.