There is one vector add in my brook source code:
y = x + uint4(1u, 2u, 3u, 4u);
Both x and y are uint4.
I complied the code with brcc, part of the generated IL code is:
"dcl_literal l12,0x00000001,0x00000001,0x00000001,0x00000001\n"
"dcl_literal l13,0x00000002,0x00000002,0x00000002,0x00000002\n"
"dcl_literal l14,0x00000003,0x00000003,0x00000003,0x00000003\n"
"dcl_literal l15,0x00000004,0x00000004,0x00000004,0x00000004\n"
"mov r285.x___,l12.x000\n"
"mov r272.x___,r285.x000\n"
"mov r286.x___,l13.x000\n"
"mov r272._y__,r286.0x00\n"
"mov r287.x___,l14.x000\n"
"mov r272.__z_,r287.00x0\n"
"mov r288.x___,l15.x000\n"
"mov r272.___w,r288.000x\n"
"iadd r273.xyzw,r269.xyzw,r272.xyzw\n"
The generated IL code looks like quite inefficient. With my basic understanding of IL, they can be optimized like this:
"dcl_literal l12,0x00000001,0x00000002,0x00000003,0x00000004\n"
"iadd r273.xyzw,r269.xyzw,l12.xyzw\n"
Is there any way to write the .br source code to instruct the brcc compiler generate the compact verision of IL code above?