LDS Direct Read performance

Question asked by yurtesen on May 26, 2014
at section 9.3.1


LDS Direct reads occur in vector ALU (VALU) instructions and allow the LDS to

supply a single DWORD value which is broadcast to all threads in the wavefront

and is used as the SRC0 input to the ALU operations. A VALU instruction

indicates that input is to be supplied by LDS by using the LDS_DIRECT for the

SRC0 field.


I am interested to know how many clock cycles penalty does it have compared to using a data which is already in a register?


Does ALUs have some hidden registers to receive the data in SRC0? or where does the broadcasted data gets stored?