4 Replies Latest reply on Feb 1, 2009 3:08 AM by bpurnomo

    Generate Compiled Assembly with RenderMonkey 1.81

      Can RenderMonkey generate compiled shader assembly?

      Is it possible for RenderMonkey to show the compiled assembly for compiled shader code?

      I'm using the Unity3d engine which lets me view the Cg compiled assembly. I wanted to compare this assembly against what I get from RenderMonkey to identify problems.

      Basically I have an issue where a shader works on the mac but not on windows. And I want to compare the assembly to try to find the issue.



        • Generate Compiled Assembly with RenderMonkey 1.81

          Hi Tim,

            GPU ShaderAnalyzer will be able to perform what you are looking for.


            • Generate Compiled Assembly with RenderMonkey 1.81

              In Rendermonkey has disassembly - at a few levels.

              Open a Shader Editor window (where you actually type the shader code)

              Look at the toolbar that contains the shader "Target" and "Entry Point". Locate the button "Disassembly".

              Don't feel bad - for some reason, this button seems to hide from everyone. And, it's too bad -- b/c there are lots of goodies inside...

              When entering the Disassembly window, you'll be presented with the D3D tokens (although they insist on calling it assembly). Notice the drop-menu near the top of this window. Here, you have access to microcode disassembly for various AMD chipsets. (Cool!)

              In RM 1.81, the highest chipset supported is the 580 -- so this feature was much cooler in 2006. Rendermonkey hasn't been updated, so no microcode for R600/700.

              AMD: Obvious request... Can we get microcode for all of your chipsets inside RM? Thank you!

              Anyway..... if you have not seen this feature before, I'll give a quick demo:

              this is the fragment HLSL for the "Polynomial Texture Map" sample that ships with RM:

              float mode;
              sampler a012_map;
              sampler a345_map;
              sampler normalizer;
              sampler rgb_map;
              // Polynomial texture map lighting                      //
              //                                                  //
              // (C) Nathaniel Hoffman, 2003                      //
              //                                                  //
              // Based on 'Polynomial Texture Maps', SIGGRAPH 2001, by Tom        //
              // Malzender, Dan Gelb and Hans Wolters from HP Labs              //
              float4 main( float4 Tex: TEXCOORD0, float3 Light:   TEXCOORD1 ) : COLOR
                 float3 lu2_lv2_lulv;
                 float4 c;
                 float3 a012;
                 float3 a345;

                 // Normalize light direction
                 Light = texCUBE(normalizer, Light) * 2.0 - 1.0;

                 // z-extrapolation
                 if (mode > 0.0f && Light.z < 0.0)
                    Light.xy = normalize(Light.xy);
                    Light.xy *= (1.0 - Light.z);
                 Light.z = 1.0;

                 // Prepare higher-order terms
                 lu2_lv2_lulv = Light.xyx * Light.xyy;

                 // read higher-order coeffs from texture and unbias
                 a012 = tex2D(a012_map, Tex) * 2.0 - 1.0;

                 // read lower-order coeffs from texture and unbias
                 // (a5 isn't biased, just halved)
                 a345 = tex2D(a345_map, Tex) * 2.0 - 1.0;
                 a345[2] += 1.0;

                 // Evaluate polynomial
                 c = dot(lu2_lv2_lulv, a012) + dot(Light, a345);

                 // Multiply by rgb factor
                 c = c * tex2D(rgb_map, Tex);
                 return c;

              Clicking the Disassembly button reveals the D3D tokens

              // Generated by Microsoft (R) D3DX9 Shader Compiler
              // Parameters:
              //   sampler2D a012_map;
              //   sampler2D a345_map;
              //   float mode;
              //   samplerCUBE normalizer;
              //   sampler2D rgb_map;
              // Registers:
              //   Name         Reg   Size
              //   ------------ ----- ----
              //   mode         c0       1
              //   a012_map     s0       1
              //   a345_map     s1       1
              //   normalizer   s2       1
              //   rgb_map      s3       1

                  def c1, 2, -1, 0, 1
                  def c2, -1, -1, 0, 2
                  dcl t0.xy
                  dcl t1.xyz
                  dcl_2d s0
                  dcl_2d s1
                  dcl_cube s2
                  dcl_2d s3
                  texld r0, t1, s2
                  texld r1, t0, s1
                  texld r2, t0, s0
                  texld r3, t0, s3
                  mad r0.xyz, r0, c1.x, c1.y
                  dp2add r0.w, r0, r0, c1.z
                  rsq r0.w, r0.w
                  mul r4.xy, r0, r0.w
                  add r0.w, -r0.z, c1.w
                  mul r4.xy, r4, r0.w
                  cmp r0.w, r0.z, c1.z, c1.w
                  mov r4.zw, c1
                  cmp r1.w, -c0.x, r4.z, r4.w
                  mul r0.w, r0.w, r1.w
                  cmp r0.xy, -r0.w, r0, r4
                  mul r4.xy, r0, r0
                  mul r4.z, r0.y, r0.x
                  mad r1.xyz, r1, c2.w, c2
                  mov r0.z, c1.w
                  dp3 r0.w, r0, r1
                  mad r0.xyz, r2, c1.x, c1.y
                  dp3 r1.w, r4, r0
                  add r0.w, r0.w, r1.w
                  mul r0, r3, r0.w
                  mov oC0, r0

              // approximately 26 instruction slots used (4 texture, 22 arithmetic)

              Then, via the drop-menu, I selected the R580

              ======== Begin neutral format pixel shader: 0 =============
               Shader stats:
                   RS Instructions:         2
                   TEX Instructions:        4
                   ALU Instructions:        9
                   ALU Instruction slots:   9
                   CF Instructions:         0
                   Pix Size:                4
                   Highest Const:           2
                   Start Addr:              0
                   End Addr:               12
               RS Instructions:
                 rs 00:                            r00.rg-- = txc00
                 rs 01:                            r01.rgb- = txc01
               US Program:

                0 tex 00    :  r01.rgb_ = lookup(r01.rgbr, tex02) ign_unc
                1 tex 01    :  r02.rgb_ = lookup(r00.rgrr, tex01) ign_unc
                2 tex 02    :  r03.rgb_ = lookup(r00.rgrr, tex00) ign_unc
                3 tex 03    :  r00.rgba = lookup(r00.rgrr, tex03) sem_wait sem_grab ign_unc
                4 alu 00 pre:  srcp.rgb = 1.0-2.0*r01.rgb
                4 alu 00 rgb:             r04.--b = d2a(neg(srcp.rg0.0), neg(srcp.rg0.0), 0.0) sem_wait
                       alpha:             r01.a   = cmp(0.0, 1.0, neg(c00.r)) 
                5 alu 01 rgb:             r02.rgb = mad(r02.rgb, (+2.0000000E+00).aaa, c02.rrb)
                       alpha:             r02.a   = rsq(abs(r04.b)) 
                6 alu 02 pre:  srcp.rgb = 1.0-2.0*r01.rgb
                6 alu 02 rgb:             r04.rg- = mad(neg(srcp.rg0.0), r02.aar, 0.0)
                       alpha:             r01.a   = cmp(0.0, r01.a, neg(srcp.b)) 
                7 alu 03 pre:  srcp.rgb = 1.0-2.0*r01.rgb
                7 alu 03 rgb:             r04.rg- = mad(r04.rg0.0, srcp.bb0.0, r04.rg0.0)
                       alpha:                       mad(0.0, 0.0, 0.0) 
                8 alu 04 pre:  srcp.rgb = 1.0-2.0*r01.rgb
                8 alu 04 rgb:             r01.rg- = cmp(neg(srcp.rg0.0), r04.rg0.0, neg(r01.aar))
                       alpha:                       mad(0.0, 0.0, 0.0) 
                9 alu 05 rgb:             r04.rgb = mad(r01.rgr, r01.rgg, 0.0)
                       alpha:             r02.a   = mad(1.0, r02.b, 0.0) 
                10 alu 06 rgb:                       d2a(r01.rg0.0, r02.rg0.0, r02.rra)
                       alpha:             r03.a   = dp() 
                11 alu 07 pre:  srcp.rgb = 1.0-2.0*r03.rgb
                11 alu 07 rgb:                       dp3(r04.rgb, neg(srcp.rgb))
                       alpha:             r01.a   = dp() 
                 alu 07 post-NOP
                12 alu 08 pre:  srcp.a   = r03.a+r01.a
                12 alu 08 rgb:  out0.rgb =           mad(r00.rgb, srcp.aaa, 0.0)
                       alpha:  out0.a   =           mad(r00.a, srcp.a, 0.0)  last

              Also, metrics given at the bottom of the window:

              5 general purpose registers used (r00-r04)
              4.00 cycles on bilinear fetch
              4.80 cycles on trilinear
              5.60 cycles on aniso

              Shader stats:
                   RS Instructions:         2
                   TEX Instructions:        4
                   ALU Instructions:        9
                   ALU Instruction slots:   9
                   CF Instructions:         0

              Lots of good insight on how and where pairing is happening. You'll begin to see that the D3D 'disasm' is a very high level view. You really need to see the microcode to know how a given shader is going to execute. This depth in RM is fantastic, and I would love to see continued support. I know GSA has this functionality; but I like suport in the authoring tool also. Simultaneous visuals AND codegen feedback is the powerful part.

              Think about the optimization workflow ... Mine goes something like:

              (1) make some sort of code-side change: remove some math, approximate a function, make something branchless .......
              (2) check immediate visual (render)
              (3) check immediate metrics (codegen)
              Repeat until I am happy with both (2) and (3).

              The Monkey is great for this process. Before we had xenon alpha kits oon Madden

              GSA doesn't allow for this kind of iteration. It's a good tool, but it doesn't provide this kind of workflow; something that RM does very well.

              Jim Hejl (jim AT hejl DOT com)