Archives Discussions

dar1243 · ‎07-16-2009

Hi

I have this fragment shader (GLSL)

void main()
{
gl_FragColor = clamp(gl_TexCoord[0], 0.0, 1.0);
return;
}

Disassembly is: (RV770, latest shader analyzer)
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(8)
      0 x: MAX         ____, R0.w, 0.0f
         y: MAX         ____, R0.z, 0.0f
         z: MAX         ____, R0.y, 0.0f
         w: MAX         ____, R0.x, 0.0f
      1 x: MIN         R0.x, PV0.w, 1.0f
         y: MIN         R0.y, PV0.z, 1.0f
         z: MIN         R0.z, PV0.y, 1.0f
         w: MIN         R0.w, PV0.x, 1.0f
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 2 ALU

now I have this pixel shader (DX9 HLSL)

float4 main(float4 Val : TEXCOORD0) : COLOR0
{
return clamp(Val, 0.0, 1.0);
}

Disassembly is: (RV770, latest shader analyzer)

; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(4)
      0 x: MOV         R0.x, R0.x      CLAMP
         y: MOV         R0.y, R0.y      CLAMP
         z: MOV         R0.z, R0.z      CLAMP
         w: MOV         R0.w, R0.w      CLAMP
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 1 ALU

I have some complex shaders that are ALU bound (and they use lots of clamp(val, 0, 1) / saturate(val))
In HLSL they take about 80 cycles to execute (estimated)
but identical shader in GLSL takes 120 cycles (estimated) to execute
couse each saturate/clamp (from 0 to 1) is expanded to max/min sequence

(and number of alu instructions gets higher ;/)

Is this GLSL compiller flaw ? Or there is some nasty way to force GLSL to generate
'mov reg, reg, clamp' in microcode ?

Any fix for this is expected soon ?

bpurnomo · ‎07-23-2009

Interesting finding. I'll pass it on to our shader compiler engineers.

Archives Discussions

HLSL 'clamp' vs. GLSL 'clamp' performance