1 Reply Latest reply on Jul 23, 2009 1:34 PM by bpurnomo

    HLSL 'clamp' vs. GLSL 'clamp' performance

    dar1243

      Hi

      I have this fragment shader (GLSL)

      void main()
      {
       gl_FragColor = clamp(gl_TexCoord[0], 0.0, 1.0);
       return;
      }

      Disassembly is: (RV770, latest shader analyzer)
      ; --------  Disassembly --------------------
      00 ALU: ADDR(32) CNT(8)
            0  x: MAX         ____,  R0.w,  0.0f     
               y: MAX         ____,  R0.z,  0.0f     
               z: MAX         ____,  R0.y,  0.0f     
               w: MAX         ____,  R0.x,  0.0f     
            1  x: MIN         R0.x,  PV0.w,  1.0f     
               y: MIN         R0.y,  PV0.z,  1.0f     
               z: MIN         R0.z,  PV0.y,  1.0f     
               w: MIN         R0.w,  PV0.x,  1.0f     
      01 EXP_DONE: PIX0, R0
      END_OF_PROGRAM

      --> 2 ALU

      now I have this pixel shader (DX9 HLSL)

      float4 main(float4 Val : TEXCOORD0) : COLOR0
      {
       return clamp(Val, 0.0, 1.0);
      }

      Disassembly is: (RV770, latest shader analyzer)

      ; --------  Disassembly --------------------
      00 ALU: ADDR(32) CNT(4)
            0  x: MOV         R0.x,  R0.x      CLAMP
               y: MOV         R0.y,  R0.y      CLAMP
               z: MOV         R0.z,  R0.z      CLAMP
               w: MOV         R0.w,  R0.w      CLAMP
      01 EXP_DONE: PIX0, R0
      END_OF_PROGRAM

      --> 1 ALU

      I have some complex shaders that are ALU bound (and they use lots of clamp(val, 0, 1) / saturate(val))
      In HLSL they take about 80 cycles to execute (estimated)
      but identical shader in GLSL takes 120 cycles (estimated) to execute
      couse each saturate/clamp (from 0 to 1) is expanded to max/min sequence

      (and number of alu instructions gets higher ;/)

      Is this GLSL compiller flaw ? Or there is some nasty way to force GLSL to generate
      'mov reg, reg, clamp' in microcode ?

      Any fix for this is expected soon ?