11 Replies Latest reply on Mar 18, 2009 7:09 AM by empty_knapsack

    calCL ignoring 0x80000000 constant when adding?

    empty_knapsack
      iadd reg,0x80000000 == nothing

      I've AMD IL code that looks like:

      dcl_literal l12,0x80000000,0x80000000,0x80000000,0x80000000

      ...
      mov r45.x,l12.x

      ...
      iadd r34.x,r34.x,r45.x

      If constant for iadd == 0x80000000 then calCL just ignoring it. In compiled code there no ADD_INT instruction generated. When changing constant to anything else (like 0x7ffffff or 0x80000001) everything became OK.

      I've started to think that I missed something in declaration of constant (signed/unsigned) but ixor r34.x,r34.x,r45.x working ok with correct XOR_INT instruction generated.

      Is this calCL compiler/optimizer bug or am I missed something being not so familiar with AMD IL atm?

       

        • calCL ignoring 0x80000000 constant when adding?
          MicahVillmow

          empty_knapsack, 

           If it is  possible, can you email a test shader showing this to streamdeveloper@amd.com and have them forward it to me so I can verify and work on getting it fixed?

           

          Thanks,

            • calCL ignoring 0x80000000 constant when adding?
              empty_knapsack

               

               If it is  possible, can you email a test shader showing this to streamdeveloper@amd.com

               



              Done.

              In fact shader is simple enough to just post it here:

              il_ps_2_0
              dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__
              dcl_output_generic o0
              dcl_output_generic o1
              dcl_output_generic o2
              dcl_cb cb0[4]
              dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
              dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
              dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
              sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx
              sample_resource(1)_sampler(0) r1, vWinCoord0.xyxx
              sample_resource(2)_sampler(0) r2, vWinCoord0.xyxx
              dcl_literal l1,0x7fffffff,0x7fffffff,0x7fffffff,0x7fffffff
              dcl_literal l2,0x80000000,0x80000000,0x80000000,0x80000000
              dcl_literal l3,0x80000001,0x80000001,0x80000001,0x80000001

              iadd r10.x,r0.x,l1.x
              ixor r10.x,r10.x,l2.x

              iadd r11.x,r1.x,l2.x
              ixor r11.x,r11.x,l3.x

              iadd r12.x,r2.x,l3.x
              ixor r12.x,r12.x,l1.x

              mov o0,r10
              mov o1,r11
              mov o2,r12
              end

              There no add,0x80000000 in compiled output, only 2 adds and 3 xors.

                • calCL ignoring 0x80000000 constant when adding?
                  rick.weber

                  What are the range of values you're sampling? Also, what result is placed into the output stream o0? Is it just the value you sampled xord by INT_MIN+1? Finally, do you know the addition isn't taking place because your result isn't what you expected, or did you disassemble the kernel?

                    • calCL ignoring 0x80000000 constant when adding?
                      empty_knapsack

                       

                      Originally posted by: rick.weber What are the range of values you're sampling? Also, what result is placed into the output stream o0? Is it just the value you sampled xord by INT_MIN+1? Finally, do you know the addition isn't taking place because your result isn't what you expected, or did you disassemble the kernel?

                      I've disassembled the kernel after i've got wrong results on several sample datas. Actually it's easy to see how this kernel will be compiled via Stream KernelAnalyzer -- just copy/paste kernel there, results (for RV770) will be:

                      ; --------  Disassembly --------------------
                      00 TEX: ADDR(64) CNT(3) VALID_PIX
                            0  SAMPLE R1.x___, R0.xyxx, t0, s0  UNNORM(XYZW)
                            1  SAMPLE R2.x___, R0.xyxx, t2, s0  UNNORM(XYZW)
                            2  SAMPLE R0.x___, R0.xyxx, t1, s0  UNNORM(XYZW)
                      01 ALU: ADDR(32) CNT(19)
                            3  x: ADD_INT     ____,  R2.x,  (0x80000001, -1.401298464e-45f).x     
                               y: ADD_INT     ____,  (0x7FFFFFFF, 1.#QNANf).y,  R1.x     
                               z: XOR_INT     R0.z,  R0.x,  (0x80000001, -1.401298464e-45f).x      VEC_201
                            4  x: XOR_INT     R0.x,  (0x7FFFFFFF, 1.#QNANf).x,  PV3.x     
                               w: XOR_INT     R0.w,  PV3.y,  (0x80000000, 0.0f).y     
                            5  x: MOV         R3.x,  PV4.x     
                               y: MOV         R3.y,  R0.y     
                               z: MOV         R3.z,  R0.y     
                               w: MOV         R3.w,  R0.y     
                            6  x: MOV         R2.x,  R0.z     
                               y: MOV         R2.y,  R0.y     
                               z: MOV         R2.z,  R0.y     
                               w: MOV         R2.w,  R0.y     
                            7  x: MOV         R1.x,  R0.w     
                               y: MOV         R1.y,  R0.y     
                               z: MOV         R1.z,  R0.y     
                               w: MOV         R1.w,  R0.y     
                      02 EXP_DONE: PIX0, R1  BRSTCNT(2)
                      END_OF_PROGRAM

                       

                      As you see, no iadd for 0x80000000. I suspect the reason of this that (unsigned int)0x80000000 == -0.0f. And CAL CL decided to remove "unnecessary" addition with "zero" being wrong about int addition == float one.