3 Replies Latest reply on Jul 9, 2008 6:04 PM by eduardoschardong

    GPU ShaderAnalyser question

    jonathan81

      Hello,

      I have download the new version of GSA 1.43.

      I look some examples and i don't understand the result

      First example

      kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>
      {
          float f = 0;
          for (f = 0 ; f <  2044; ++f)
         {
             c = a + b;
          }
      }

      Radeon HD 3870,3,2.00,2.80,2.27,2.00,0.50,2.40,0.42,2.80,0.36,TEX,TEX,TEX,8.00,6.67,5.71,0

       Second Example :

      kernel void sum(float fN, float4 a<>, float4 b<>, out float4 c<>
      {
          float f = 0;
          for (f = 0 ; f <  2045; ++f)
         {
             c = a + b;
          }
      }


      Radeon HD 3870,4,2.00,63.75,13.91,13.91,6.95,13.91,5.80,13.91,4.97,ALU,ALU,ALU,1.15,1.15,1.15,0

      Why when i add one iteration the Disassembly change and the average increase and register too ???

      Thanks in advance

      J

        • GPU ShaderAnalyser question
          eduardoschardong
          Aparently the optimizer evaluted the first 2043 iterations and found they don't affect final result, so otimized they away, in the second case the optimizer was unable to go beyond 2044 interations so it coded the entire loop.
            • GPU ShaderAnalyser question
              jonathan81

              Why we have this limit ?

              I test this kernel in a new project with float :

              kernel void sum( float a<>, float b<>, out float c<>
              {
                  float f = 0;
                  for (f = 0 ; f <  4095; ++f)
                 {
                     c = a + b;
                  }
              }

              Time 0.123029

               

              With this kernel

               

              kernel void sum(float a<>, float b<>, out float c<>
              {
                  float f = 0;
                  for (f = 0 ; f <  4096; ++f)
                 {
                     c = a + b;
                  }
              }

              Time 0.244093

              with 4094 iterations , i have the same time that with 4095 iterations

               

              In GSA , when i test those two kernel i have the same phenomenom with 3 registers to 6 registers ?????

               

                • GPU ShaderAnalyser question
                  eduardoschardong
                  Originally posted by: jonathan81
                  Why we have this limit ?

                  I belive they did it to keep compilation time reasonable, if there was no such limit a bug code or a latency test would take forever to compile.

                  And from your data it evalutes something from 8192 to 16384 instructions, wich is a lot for any constant in real life programs.


                  Originally posted by: jonathan81
                  In GSA , when i test those two kernel i have the same phenomenom with 3 registers to 6 registers ?????




                  Same here, in the first case the compiler replaced your code by something like:
                  [code]
                  kernel void sum(float a<>, float b<>, out float c<>)
                  {
                  c = a + b;
                  }
                  [/code]
                  And only needed 3 GPRs for it, in the second it compiled the entire loop and need more registers.