cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

drstrip
Journeyman III

efficiency of vector types (eg, int4)

I want to make sure I understand vector types and how they execute.

Assume a, b, c  are int4.

If I write

c=a+b;

then all four components are added pairwise simultaneously in a single thread processor, in a single instruction, using the four "normal" stream cores.

If, on the other hand, I declare ax, ay, az, aw, bx, ... as int and write

cx = ax + bx;

cy = ay + by;

cz = az + bz;

cw = aw + bw;

then in theory the compiler could optimize this by essentially figuring out to organize the storage the same way as the int4 and add them the way it does the int4, but that's a hell of an optimization to count on, esp when you can insure the optimization using int4.

 

In this correct?

0 Likes
1 Reply
ryta1203
Journeyman III

Yes. There is no data dependency so these will "pack".

0 Likes