i am trying to allocate two unit16 arrays in private memory (128KB). From what i understood is that the SI Tahiti has 256KB vgprs. When i try to compile the kernel for SI Tahiti with CodeXL i get an insufficient resources error. I am able to compile the kernel with two uint16 arrays and it uses only 92 vgprs. is there an internal limit of how many vgprs can be used by one kernel?