Hopefully someone from AMD techpubs will see this. I was trying to look up the bandwidths of the various memory systems in rev 2.7 of the AMD APP OpenCL Programming Guide and found assorted contradictory information about the bandwidths of the register file and LDS. I found the following claims for bandwidth per stream processor per cycle
Register file: 48B (6-11), 12B (6-15)
LDS: 2B (6-10, based on 14x ratio to global) 8B (6-11 and 6-15), 1/6 of reg (6-11)
The only way the numbers make sense to me is if it is 12B for registers (which makes sense for 2 inputs and 1 output) and 2B for LDS (which makes sense for 32x4B banks shared by 64 processors). It would be great if this could be fixed in future versions of the document.
Thanks for reporting it. As per section "Device Parameters", if you calculate the peak read bandwidth/Processing Element for register and LDS for Pitcairn XT (which is based on GCN), the numbers come as follows:
Register Peak Read Bandwidth/ Processing Element = 15360 / (1 * 1280) = 12B /cycle
LDS Peak Read Bandwidth/ Processing Element = 2560 / (1 * 1280) = 2B /cycle
But the numbers seem okay for NI or Evergreen.
So, indeed the numbers look confusing for GCN [at least term peak read bandwidth/Stream core for GCN]. I've asked someone for more clarification and forwarded a request to update, if required, the corresponding sections of the guide. I'll let you know as soon as I get any reply.