AMD's OpenCL User's Guide has tables showing the number of clock
cycles it takes to execute various integer, single (fp32) and
double (fp64) precision floating point calculations for the more recent
gpu models. Something similar for the latest chips is of interest to me.
Also, any changes in the size of LDS per CU and the number of VGPRs per compute
element in the new cards would be of interest to users of the gpus as coprocessors.