efficiency of vector types (eg, int4)

Discussion created by drstrip on Apr 3, 2010
Latest reply on Apr 4, 2010 by ryta1203

I want to make sure I understand vector types and how they execute.

Assume a, b, c  are int4.

If I write


then all four components are added pairwise simultaneously in a single thread processor, in a single instruction, using the four "normal" stream cores.

If, on the other hand, I declare ax, ay, az, aw, bx, ... as int and write

cx = ax + bx;

cy = ay + by;

cz = az + bz;

cw = aw + bw;

then in theory the compiler could optimize this by essentially figuring out to organize the storage the same way as the int4 and add them the way it does the int4, but that's a hell of an optimization to count on, esp when you can insure the optimization using int4.


In this correct?