AnsweredAssumed Answered

How should I code in OpenCL for fewer load instructions?

Question asked by jclin on Dec 3, 2013
Latest reply on Dec 4, 2013 by realhet

I'm using AMD-APP (1214.3). My code in OpenCL is as follows,

    // W is an uint4 variable

    uint4 T = (uint4)(1U, 2U, 3U, 4U);

    T += W;


After compilation, I saw the IL contains multiple addition instructions to form a uint vector;


dcl_literal l16, 0x00000001, 0x00000001, 0x00000001, 0x00000001

dcl_literal l19, 0x00000002, 0x00000002, 0x00000002, 0x00000002

dcl_literal l18, 0x00000003, 0x00000003, 0x00000003, 0x00000003

dcl_literal l17, 0x00000004, 0x00000004, 0x00000004, 0x00000004

        mov r66, l16

        iadd r66, r66.xyz0, l17.000x

        iadd r66, r66.xy0w, l18.00x0

        iadd r66, r66.x0zw, l19.0x00

        iadd r75, r75, r66


So, how could I code in OpenCL to achieve fewer instruction. For example, one instruction load and then iadd, like following

dcl_literal l16, 0x00000001, 0x00000002, 0x00000003, 0x00000004

       move r66, l16

       iadd r75, r75, r66