at section 9.3.1
LDS Direct reads occur in vector ALU (VALU) instructions and allow the LDS to
supply a single DWORD value which is broadcast to all threads in the wavefront
and is used as the SRC0 input to the ALU operations. A VALU instruction
indicates that input is to be supplied by LDS by using the LDS_DIRECT for the
I am interested to know how many clock cycles penalty does it have compared to using a data which is already in a register?
Does ALUs have some hidden registers to receive the data in SRC0? or where does the broadcasted data gets stored?