It may be different for different gpu's but I'm interesting in 48xx.
You can have max 128 registers per thread. Number of wavefronts that can be executed on a single SIMD is decided by register usage in your shader (Total registers per SIMD are 64*256).
So I can run 128 heavy threads per SIMD with no care about registers, thank you.