See section 4.9.2 in the APP OpenCL Programming Guide.
The basic fact is that there are 16384 GPRs, which are split evenly amongst hardware threads.
As well as the headline GPR allocation reported by SKA the clause-temporary GPRs need to be accounted for. On Evergreen series there can be up to 8 of these per work item. Earlier GPUs support 4. Cayman doesn't seem to use them (not seen any in SKA), not sure though.
So 8 GPRs per work item * 64 work items per hardware thread * 2 hardware threads in the pipeline = 1024 registers of maximum overhead from clause-temporary registers.
So subtract the actual clause temporary overhead from 16384 first, then you can perfom the simple division that tells you how many hardware threads (wavefronts) you'll get.
The numbers in the table look like they account for clause temporary registers, but it seems to be a generalisation for kernels that use 4 temporaries.
nVidia's table accounts for not only registers. But also for shared memory, max possible numbers of blocks in flight and so on. Yes, almost all this info can be found in manual (for HD5xxx, not for HD4xx, btw), but nVidia's representation makes life easier, while AMD again and again "says" RTFM ...