dar1243

HLSL 'clamp' vs. GLSL 'clamp' performance

Discussion created by dar1243 on Jul 16, 2009
Latest reply on Jul 23, 2009 by bpurnomo

Hi

I have this fragment shader (GLSL)

void main()
{
 gl_FragColor = clamp(gl_TexCoord[0], 0.0, 1.0);
 return;
}

Disassembly is: (RV770, latest shader analyzer)
; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(8)
      0  x: MAX         ____,  R0.w,  0.0f     
         y: MAX         ____,  R0.z,  0.0f     
         z: MAX         ____,  R0.y,  0.0f     
         w: MAX         ____,  R0.x,  0.0f     
      1  x: MIN         R0.x,  PV0.w,  1.0f     
         y: MIN         R0.y,  PV0.z,  1.0f     
         z: MIN         R0.z,  PV0.y,  1.0f     
         w: MIN         R0.w,  PV0.x,  1.0f     
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 2 ALU

now I have this pixel shader (DX9 HLSL)

float4 main(float4 Val : TEXCOORD0) : COLOR0
{
 return clamp(Val, 0.0, 1.0);
}

Disassembly is: (RV770, latest shader analyzer)

; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(4)
      0  x: MOV         R0.x,  R0.x      CLAMP
         y: MOV         R0.y,  R0.y      CLAMP
         z: MOV         R0.z,  R0.z      CLAMP
         w: MOV         R0.w,  R0.w      CLAMP
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 1 ALU

I have some complex shaders that are ALU bound (and they use lots of clamp(val, 0, 1) / saturate(val))
In HLSL they take about 80 cycles to execute (estimated)
but identical shader in GLSL takes 120 cycles (estimated) to execute
couse each saturate/clamp (from 0 to 1) is expanded to max/min sequence

(and number of alu instructions gets higher ;/)

Is this GLSL compiller flaw ? Or there is some nasty way to force GLSL to generate
'mov reg, reg, clamp' in microcode ?

Any fix for this is expected soon ?

Outcomes