so the GLSL frontend should be far far far more efficient than now - couse in average indentical shaders in glsl are about 25% slower than those in DX9 HLSL / DX10 HLSL. (With IMHO is HUGE difference)
The min/max vs. clamp thing should help, but not only this, another example:
HLSL DX9 (PS_3_0)
float4 main(float4 UV : TEXCOORD1) : COLOR0
{
float4 Result;
Result.w = 1.0;
Result.xyz = UV.xyz / sqrt(UV.w);
return Result;
}
microcode:
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(5)
0 w: MOV R1.w, 1.0f
t: RSQ_e ____, |R0.w|
1 x: MUL R1.x, R0.x, PS0
y: MUL R1.y, R0.y, PS0
z: MUL R1.z, R0.z, PS0
01 EXP_DONE: PIX0, R1
END_OF_PROGRAM
GLSL
void main()
{
gl_FragColor.w = 1.0;
gl_FragColor.xyz = gl_TexCoord[1].xyz / sqrt(gl_TexCoord[1].w);
}
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(6)
0 w: MOV R1.w, 1.0f
t: SQRT_e ____, R0.w
1 t: RCP_e ____, PS0
2 x: MUL_e R1.x, R0.x, PS1
y: MUL_e R1.y, R0.y, PS1
z: MUL_e R1.z, R0.z, PS1
01 EXP_DONE: PIX0, R1
END_OF_PROGRAM
Why the hell GLSL version do not use faster invsquareroot opcode and istead use 2 slower sqrt and rcp opcodes ?!
PS. All above in RV770 assembly.