AnsweredAssumed Answered

LDS Direct Read performance

Question asked by yurtesen on May 26, 2014
Latest reply on May 27, 2014 by realhet

In http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf

at section 9.3.1

 

LDS Direct reads occur in vector ALU (VALU) instructions and allow the LDS to

supply a single DWORD value which is broadcast to all threads in the wavefront

and is used as the SRC0 input to the ALU operations. A VALU instruction

indicates that input is to be supplied by LDS by using the LDS_DIRECT for the

SRC0 field.

 

I am interested to know how many clock cycles penalty does it have compared to using a data which is already in a register?

 

Does ALUs have some hidden registers to receive the data in SRC0? or where does the broadcasted data gets stored?

Outcomes