4 Replies Latest reply on Jan 5, 2011 6:00 PM by omion

    mad24 - why no runtime difference?

    FrodoTheGiant

      Hello AMD

       

      I have replaced all a*b+c integer calculations with mad24 in my code (and I have quite a few of those!). Unfortunately I see no runtime difference at all.

      Questions:

      1) Does your OpenCL compiler automatically replace all a*b+c with mad24 ?

      2) Does your OpenCL implementation support mad24 - or does the compiled code silently replace all mad24 with a*b+c ?

        • mad24 - why no runtime difference?
          MicahVillmow
          This optimization occurs based on the hardware support for the mad24 operation. Not all hardware supports both signed and unsigned version. Also, if your program is not bound by ALU, you might not see any performance gain.
            • mad24 - why no runtime difference?
              FrodoTheGiant

              Thanks for your reply Micah.

               

              The code runs on a HD6970. Yes, it is memory bound - but since I have lots of those a*b+c I thought I might see at least *some* difference. No?

              • mad24 - why no runtime difference?
                omion
                This is not directly related since you have a 6970, but it should be noted anyway.

                On the 5800 series, signed mul24(a,b) is turned into (((a<<8)>>8)*((b<<8)>>8)). This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:

                signed a * b: 0.9736s
                unsigned mul24(a,b): 0.9734s
                signed mul24(a,b): 2.2771s

                To AMD:
                I don't think mul24 should EVER be slower than 32-bit multiplication. The OpenCL spec says that if the inputs are don't fit into a 24-bit number the answer is implementation-defined. Doing all those bit-shifts will only affect the inputs if they are in the "implementation-defined" range, where the answer can't be relied on anyway. If the inputs already fit into a 24-bit number the shifts will not affect the result and therefore simply waste time.
              • mad24 - why no runtime difference?
                MicahVillmow
                If your code is memory bound, then your ALU is basically free.