I'm using AMD-APP (1214.3). My code in OpenCL is as follows,
// W is an uint4 variable
uint4 T = (uint4)(1U, 2U, 3U, 4U);
T += W;
After compilation, I saw the IL contains multiple addition instructions to form a uint vector;
dcl_literal l16, 0x00000001, 0x00000001, 0x00000001, 0x00000001
dcl_literal l19, 0x00000002, 0x00000002, 0x00000002, 0x00000002
dcl_literal l18, 0x00000003, 0x00000003, 0x00000003, 0x00000003
dcl_literal l17, 0x00000004, 0x00000004, 0x00000004, 0x00000004
mov r66, l16
iadd r66, r66.xyz0, l17.000x
iadd r66, r66.xy0w, l18.00x0
iadd r66, r66.x0zw, l19.0x00
iadd r75, r75, r66
So, how could I code in OpenCL to achieve fewer instruction. For example, one instruction load and then iadd, like following
dcl_literal l16, 0x00000001, 0x00000002, 0x00000003, 0x00000004
move r66, l16
iadd r75, r75, r66
Thanks