cancel
Showing results for 
Search instead for 
Did you mean: 

GPU Developer Tools

Journeyman III
Journeyman III

HLSL 'clamp' vs. GLSL 'clamp' performance

Hi

I have this fragment shader (GLSL)

void main()
{
 gl_FragColor = clamp(gl_TexCoord[0], 0.0, 1.0);
 return;
}

Disassembly is: (RV770, latest shader analyzer)
; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(8)
      0  x: MAX         ____,  R0.w,  0.0f     
         y: MAX         ____,  R0.z,  0.0f     
         z: MAX         ____,  R0.y,  0.0f     
         w: MAX         ____,  R0.x,  0.0f     
      1  x: MIN         R0.x,  PV0.w,  1.0f     
         y: MIN         R0.y,  PV0.z,  1.0f     
         z: MIN         R0.z,  PV0.y,  1.0f     
         w: MIN         R0.w,  PV0.x,  1.0f     
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 2 ALU

now I have this pixel shader (DX9 HLSL)

float4 main(float4 Val : TEXCOORD0) : COLOR0
{
 return clamp(Val, 0.0, 1.0);
}

Disassembly is: (RV770, latest shader analyzer)

; --------  Disassembly --------------------
00 ALU: ADDR(32) CNT(4)
      0  x: MOV         R0.x,  R0.x      CLAMP
         y: MOV         R0.y,  R0.y      CLAMP
         z: MOV         R0.z,  R0.z      CLAMP
         w: MOV         R0.w,  R0.w      CLAMP
01 EXP_DONE: PIX0, R0
END_OF_PROGRAM

--> 1 ALU

I have some complex shaders that are ALU bound (and they use lots of clamp(val, 0, 1) / saturate(val))
In HLSL they take about 80 cycles to execute (estimated)
but identical shader in GLSL takes 120 cycles (estimated) to execute
couse each saturate/clamp (from 0 to 1) is expanded to max/min sequence

(and number of alu instructions gets higher ;/)

Is this GLSL compiller flaw ? Or there is some nasty way to force GLSL to generate
'mov reg, reg, clamp' in microcode ?

Any fix for this is expected soon ?

Tags (1)
0 Kudos
Reply
1 Reply
Staff
Staff

HLSL 'clamp' vs. GLSL 'clamp' performance

Interesting finding.  I'll pass it on to our shader compiler engineers.

0 Kudos
Reply