We finially went through and characterized one of the more elusive bug causing many problems in our code.
Bytealign/bitalign (by byte values only) inside a macro called from a function using literal values using the same value for src0, and src1 to implement rotations. (Convoluted conditions? )
The optimizer sees the constants and decides to pre-rotate for you. bit/byte align rotate *RIGHT*. When the optimizer decides to do this for you, it rotates *INCORRECTLY* to the left. Unfortunately in our code, this same call is not always with constant values. The initial conditions to the loop are literals, but as we get data off the network stream, it is combined with the constant data (therefore making it non-constant data later on).
A colleague made a pretty simple test case showing this:
dcl_literal l5, 0, 8, 0x30, 4
imul r42.z, l5.z, vAbsTidFlat.x
dcl_literal l16, 0, 16, 0, 0
dcl_literal l17, 1, 2, 3, 4
dcl_literal l19, 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f
mov r43.x, r42.z
iadd r43.y, r43.x, l16.y
uav_raw_store_id(7) mem, r43.x,l19
uav_raw_store_id(7) mem, r43.y,r0
Oh, and bitalign only has trouble on values of 8, 16, and 24 (and presumably if you pass 32, 48, etc).