I was somewhat wondering whether there is any additional reason but "The compiler generates spill code (shuffling values to, and from, memory) if it cannot fit all the live values into registers." (Programming Guide) for usage of scratch registers.
The reason I am wondering is that I have a kernel which reports the following in its generatred ISA file:
MaxScratchRegsNeeded = 92
SQ_PGM_RESOURCES:NUM_GPRS = 36
I have other kernel which report using 64 GPRs, so it wonders me why the compiler wouldn't trade some of the scratching for some additional register usage.
There is fixed number of GPRS present in every Compute Unit and that has to be shared by all the workitems that are executed on that Compute Unit.
So the situation is you can have more GPR allocation per workitem if you have less number of workitems to be run on that CU. Try reducing the workgroup size from 256 to 128/64 and you most probably should be able to get off with scratch registers.