1 Reply Latest reply on Dec 4, 2013 3:03 PM by realhet

    How should I code in OpenCL for fewer load instructions?


      I'm using AMD-APP (1214.3). My code in OpenCL is as follows,

          // W is an uint4 variable

          uint4 T = (uint4)(1U, 2U, 3U, 4U);

          T += W;


      After compilation, I saw the IL contains multiple addition instructions to form a uint vector;


      dcl_literal l16, 0x00000001, 0x00000001, 0x00000001, 0x00000001

      dcl_literal l19, 0x00000002, 0x00000002, 0x00000002, 0x00000002

      dcl_literal l18, 0x00000003, 0x00000003, 0x00000003, 0x00000003

      dcl_literal l17, 0x00000004, 0x00000004, 0x00000004, 0x00000004

              mov r66, l16

              iadd r66, r66.xyz0, l17.000x

              iadd r66, r66.xy0w, l18.00x0

              iadd r66, r66.x0zw, l19.0x00

              iadd r75, r75, r66


      So, how could I code in OpenCL to achieve fewer instruction. For example, one instruction load and then iadd, like following

      dcl_literal l16, 0x00000001, 0x00000002, 0x00000003, 0x00000004

             move r66, l16

             iadd r75, r75, r66