Archives Discussions

tgraupmann · ‎12-15-2008

Can RenderMonkey generate compiled shader assembly?

Is it possible for RenderMonkey to show the compiled assembly for compiled shader code?

I'm using the Unity3d engine which lets me view the Cg compiled assembly. I wanted to compare this assembly against what I get from RenderMonkey to identify problems.

Basically I have an issue where a shader works on the mac but not on windows. And I want to compare the assembly to try to find the issue.

Thanks,

~Tim

bpurnomo · ‎12-16-2008

Hi Tim,

GPU ShaderAnalyzer will be able to perform what you are looking for.

jhejl · ‎01-31-2009

In Rendermonkey has disassembly - at a few levels.

Open a Shader Editor window (where you actually type the shader code)

Look at the toolbar that contains the shader "Target" and "Entry Point". Locate the button "Disassembly".

Don't feel bad - for some reason, this button seems to hide from everyone. And, it's too bad -- b/c there are lots of goodies inside...

When entering the Disassembly window, you'll be presented with the D3D tokens (although they insist on calling it assembly). Notice the drop-menu near the top of this window. Here, you have access to microcode disassembly for various AMD chipsets. (Cool!)

In RM 1.81, the highest chipset supported is the 580 -- so this feature was much cooler in 2006. Rendermonkey hasn't been updated, so no microcode for R600/700.

AMD: Obvious request... Can we get microcode for all of your chipsets inside RM? Thank you!

Anyway..... if you have not seen this feature before, I'll give a quick demo:

this is the fragment HLSL for the "Polynomial Texture Map" sample that ships with RM:

float mode; sampler a012_map; sampler a345_map; sampler normalizer; sampler rgb_map; //------------------------------------------------------------------// // Polynomial texture map lighting // // // // (C) Nathaniel Hoffman, 2003 // // // // Based on 'Polynomial Texture Maps', SIGGRAPH 2001, by Tom // // Malzender, Dan Gelb and Hans Wolters from HP Labs // //------------------------------------------------------------------// float4 main( float4 Tex: TEXCOORD0, float3 Light: TEXCOORD1 ) : COLOR { float3 lu2_lv2_lulv; float4 c; float3 a012; float3 a345; // Normalize light direction Light = texCUBE(normalizer, Light) * 2.0 - 1.0; // z-extrapolation if (mode > 0.0f && Light.z < 0.0) { Light.xy = normalize(Light.xy); Light.xy *= (1.0 - Light.z); } Light.z = 1.0; // Prepare higher-order terms lu2_lv2_lulv = Light.xyx * Light.xyy; // read higher-order coeffs from texture and unbias a012 = tex2D(a012_map, Tex) * 2.0 - 1.0; // read lower-order coeffs from texture and unbias // (a5 isn't biased, just halved) a345 = tex2D(a345_map, Tex) * 2.0 - 1.0; a345[2] += 1.0; // Evaluate polynomial c = dot(lu2_lv2_lulv, a012) + dot(Light, a345); // Multiply by rgb factor c = c * tex2D(rgb_map, Tex); return c; }

Clicking the Disassembly button reveals the D3D tokens

// // Generated by Microsoft (R) D3DX9 Shader Compiler // // Parameters: // // sampler2D a012_map; // sampler2D a345_map; // float mode; // samplerCUBE normalizer; // sampler2D rgb_map; // // // Registers: // // Name Reg Size // ------------ ----- ---- // mode c0 1 // a012_map s0 1 // a345_map s1 1 // normalizer s2 1 // rgb_map s3 1 // ps_2_0 def c1, 2, -1, 0, 1 def c2, -1, -1, 0, 2 dcl t0.xy dcl t1.xyz dcl_2d s0 dcl_2d s1 dcl_cube s2 dcl_2d s3 texld r0, t1, s2 texld r1, t0, s1 texld r2, t0, s0 texld r3, t0, s3 mad r0.xyz, r0, c1.x, c1.y dp2add r0.w, r0, r0, c1.z rsq r0.w, r0.w mul r4.xy, r0, r0.w add r0.w, -r0.z, c1.w mul r4.xy, r4, r0.w cmp r0.w, r0.z, c1.z, c1.w mov r4.zw, c1 cmp r1.w, -c0.x, r4.z, r4.w mul r0.w, r0.w, r1.w cmp r0.xy, -r0.w, r0, r4 mul r4.xy, r0, r0 mul r4.z, r0.y, r0.x mad r1.xyz, r1, c2.w, c2 mov r0.z, c1.w dp3 r0.w, r0, r1 mad r0.xyz, r2, c1.x, c1.y dp3 r1.w, r4, r0 add r0.w, r0.w, r1.w mul r0, r3, r0.w mov oC0, r0 // approximately 26 instruction slots used (4 texture, 22 arithmetic)

Then, via the drop-menu, I selected the R580

======== Begin neutral format pixel shader: 0 ============= Shader stats: RS Instructions: 2 TEX Instructions: 4 ALU Instructions: 9 ALU Instruction slots: 9 CF Instructions: 0 Pix Size: 4 Highest Const: 2 Start Addr: 0 End Addr: 12 RS Instructions: rs 00: r00.rg-- = txc00 rs 01: r01.rgb- = txc01 US Program: 0 tex 00 : r01.rgb_ = lookup(r01.rgbr, tex02) ign_unc 1 tex 01 : r02.rgb_ = lookup(r00.rgrr, tex01) ign_unc 2 tex 02 : r03.rgb_ = lookup(r00.rgrr, tex00) ign_unc 3 tex 03 : r00.rgba = lookup(r00.rgrr, tex03) sem_wait sem_grab ign_unc 4 alu 00 pre: srcp.rgb = 1.0-2.0*r01.rgb 4 alu 00 rgb: r04.--b = d2a(neg(srcp.rg0.0), neg(srcp.rg0.0), 0.0) sem_wait alpha: r01.a = cmp(0.0, 1.0, neg(c00.r)) 5 alu 01 rgb: r02.rgb = mad(r02.rgb, (+2.0000000E+00).aaa, c02.rrb) alpha: r02.a = rsq(abs(r04.b)) 6 alu 02 pre: srcp.rgb = 1.0-2.0*r01.rgb 6 alu 02 rgb: r04.rg- = mad(neg(srcp.rg0.0), r02.aar, 0.0) alpha: r01.a = cmp(0.0, r01.a, neg(srcp.b)) 7 alu 03 pre: srcp.rgb = 1.0-2.0*r01.rgb 7 alu 03 rgb: r04.rg- = mad(r04.rg0.0, srcp.bb0.0, r04.rg0.0) alpha: mad(0.0, 0.0, 0.0) 8 alu 04 pre: srcp.rgb = 1.0-2.0*r01.rgb 8 alu 04 rgb: r01.rg- = cmp(neg(srcp.rg0.0), r04.rg0.0, neg(r01.aar)) alpha: mad(0.0, 0.0, 0.0) 9 alu 05 rgb: r04.rgb = mad(r01.rgr, r01.rgg, 0.0) alpha: r02.a = mad(1.0, r02.b, 0.0) 10 alu 06 rgb: d2a(r01.rg0.0, r02.rg0.0, r02.rra) alpha: r03.a = dp() 11 alu 07 pre: srcp.rgb = 1.0-2.0*r03.rgb 11 alu 07 rgb: dp3(r04.rgb, neg(srcp.rgb)) alpha: r01.a = dp() alu 07 post-NOP 12 alu 08 pre: srcp.a = r03.a+r01.a 12 alu 08 rgb: out0.rgb = mad(r00.rgb, srcp.aaa, 0.0) alpha: out0.a = mad(r00.a, srcp.a, 0.0) last

Also, metrics given at the bottom of the window:

5 general purpose registers used (r00-r04) 4.00 cycles on bilinear fetch 4.80 cycles on trilinear 5.60 cycles on aniso

Shader stats:
RS Instructions: 2 TEX Instructions: 4 ALU Instructions: 9 ALU Instruction slots: 9 CF Instructions: 0

Lots of good insight on how and where pairing is happening. You'll begin to see that the D3D 'disasm' is a very high level view. You really need to see the microcode to know how a given shader is going to execute. This depth in RM is fantastic, and I would love to see continued support. I know GSA has this functionality; but I like suport in the authoring tool also. Simultaneous visuals AND codegen feedback is the powerful part.

Think about the optimization workflow ... Mine goes something like:

(1) make some sort of code-side change: remove some math, approximate a function, make something branchless .......
(2) check immediate visual (render)
(3) check immediate metrics (codegen)
Repeat until I am happy with both (2) and (3).

The Monkey is great for this process. Before we had xenon alpha kits oon Madden

GSA doesn't allow for this kind of iteration. It's a good tool, but it doesn't provide this kind of workflow; something that RM does very well.

Cheers,
Jim Hejl (jim AT hejl DOT com)
EA

tgraupmann · ‎01-31-2009

Thanks for the hint. I've since been able to port to and from RenderMonkey a couple times.

1. Relief Shader

2. Splat blending shader

I wrote a wiki page about porting shaders: ShaderPortingNotes.

I'm still stumped on how to load a model in RenderMonkey with a second uv set; amd post.

bpurnomo · ‎02-01-2009

Thank you for the post Jhejl. I wasn't even aware of that feature.

Archives Discussions

Generate Compiled Assembly with RenderMonkey 1.81