1 Reply Latest reply on Apr 10, 2008 11:25 PM by michael.chu

    brcc, calcl and mul/add vs mad

    sgratton
      Hi there,

      I've been looking at the float and double matrix multiply examples and have noticed that brcc does not seem to emit either mad or dmad il instructions very often. This made me worry that these and similar programs in brook+ might only reach 1/2 or 2/3 of peak performance.

      However, with the help of the shader analyser it seems that the cal compiler does optimize the float case into a MULADD assembly instruction, but won't do the same for the double case. Instead it does two MUL_64's followed by an ADD_64. So for a pair of doubles this seems to take 3 instruction groups rather than the 2 it would need if it was using MULADD_64 twice. From here we know that all instruction groups take one cycle, so we seem to be getting only 2/3 theoretical performance. In time will this be changed or is there another reason I'm missing why MULADD_64's are not being used here?

      Best,
      Steven.
        • brcc, calcl and mul/add vs mad
          michael.chu
          Hi Steven,

          Over time, we are definitely going to continue doing performance tuning. Performance tuning in Brook+ has honestly not had enough time on since the focus has been on providing the features and functionality first and a separate optimization pass. So I am almost certain that over time your Brook+ code will get faster as the Brook+ team shifts into performance optimization mode. :-)

          I will, however, forward this exact post to the Brook+ team so they have a concrete data point to look at.

          Michael.