The brook code (mersenne_twister.br) contains the following set of lines, repeated for each output stream A1 ... A8 -

A1.x = a.x ^ e.x ^ ((b.x >> thirteen) & mask11) ^ f.x ^ (r2.x << fifteen);

A1.y = a.y ^ e.y ^ ((b.y >> thirteen) & mask12) ^ f.y ^ (r2.y << fifteen);

A1.z = a.z ^ e.z ^ ((b.z >> thirteen) & mask13) ^ f.z ^ (r2.z << fifteen);

A1.w = a.w ^ e.w ^ ((b.w >> thirteen) & mask14) ^ f.w ^ (r2.w << fifteen);

where mask11, mask12, mask13 are unsigned ints. Why can't this be written

A1 = a ^e^((b >> thirteen) & mask) ^f^(r2<< fifteen);

where thirteen is now uint4(13U, 13U, 13U, 13U),

mask = uint4(mask11, mask12, mask13, mask14) and

fifteen is now uint4(15U, 15U, 15U, 15U).

Am I missing an efficiency issue, or even worse, are they not equivalent?

Thank you for pointing this. This sample has been written before complete int vector type supported . You are using efficiently. Let us know how much improvement you see after this change.