SGPR is scalar general purpose registers.
In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.