Hopefully someone from AMD techpubs will see this. I was trying to look up the bandwidths of the various memory systems in rev 2.7 of the AMD APP OpenCL Programming Guide and found assorted contradictory information about the bandwidths of the register file and LDS. I found the following claims for bandwidth per stream processor per cycle
Register file: 48B (6-11), 12B (6-15)
LDS: 2B (6-10, based on 14x ratio to global) 8B (6-11 and 6-15), 1/6 of reg (6-11)
The only way the numbers make sense to me is if it is 12B for registers (which makes sense for 2 inputs and 1 output) and 2B for LDS (which makes sense for 32x4B banks shared by 64 processors). It would be great if this could be fixed in future versions of the document.