cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

yurtesen
Miniboss

LDS Direct Read performance

In http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.p...

at section 9.3.1


LDS Direct reads occur in vector ALU (VALU) instructions and allow the LDS to


supply a single DWORD value which is broadcast to all threads in the wavefront


and is used as the SRC0 input to the ALU operations. A VALU instruction


indicates that input is to be supplied by LDS by using the LDS_DIRECT for the


SRC0 field.


I am interested to know how many clock cycles penalty does it have compared to using a data which is already in a register?

Does ALUs have some hidden registers to receive the data in SRC0? or where does the broadcasted data gets stored?

0 Likes
1 Solution
realhet
Miniboss

Hi,

src_lds_direct takes exactly the same amount of time as a vector or a scalar register. (measured with s_memtime)

It is like when you broadcast a scalar register to the whole WF but basically you can have up to 16KB constants, not only 103*4 bytes, while the ALU can work at maximum utilization.

SRC0 can select from 512 different things: 256 vregs, 128sregs and 128 special things (I guess those are cam from the scalar alu also). lds_direct is on of these specials. There are many int, float constants, debug/trap registers, and state flags and even a thing that marks immediate data right after the instruction dword.

View solution in original post

1 Reply
realhet
Miniboss

Hi,

src_lds_direct takes exactly the same amount of time as a vector or a scalar register. (measured with s_memtime)

It is like when you broadcast a scalar register to the whole WF but basically you can have up to 16KB constants, not only 103*4 bytes, while the ALU can work at maximum utilization.

SRC0 can select from 512 different things: 256 vregs, 128sregs and 128 special things (I guess those are cam from the scalar alu also). lds_direct is on of these specials. There are many int, float constants, debug/trap registers, and state flags and even a thing that marks immediate data right after the instruction dword.