2 Replies Latest reply on Jan 23, 2012 9:37 PM by corry



      As usual, working in IL, looking for the best performance I can, what else is new :)

      Anyhow, I've run into a situation where I think avoiding exponents will be prohibitivly slow, so I looked at the pow IL instruction, then the ISA for it...

      I suppose if I knew what EXP_e was it might make more sense, but it looks like it does some logs, some ors, some muls, and some EXP_e's.  So I take it there is no native pow function on the caymans?  Is it really using ~5 instruction words for Pow?  Thats not horrible, but I have several cases where I think I will receive exponents of less than 5, and all numbers are 24 bits or less, so I would suspect a mul24 mul24 would be faster, but the exponent is arbitrary based on program flow, and conditionals suck, so I think the addition of conditionals for those cases will cause a major slow down, especially in comparison to the ~5 cycles for the pow.

      Thats my understanding of what I'm seeing anyways.  Any other suggestions?

        • Re: exponents...



          You can contact Yousef ( Yousef.Shajrawi@amd.com ) - please tell him that dov.caspi@amd.com sent you.

          You can ask him about IL and other optimization stuff.







            • Re: exponents...

              Thanks, I'll direct future stuff his way.  In the mean time, I think I managed to find a clever way around exponents for the time being.  I have to hit it once, but outside a loop that runs ~1M times, so 1 pow for a million iterations, I can take that hit   I might shoot an e-mail anyways.  I didn't have a lot of time to examine the code, but I figure either one of 3 things are going on, 1, knowing the suffix would make everything clear, 2, there's a cool binary trick for doing pow calculations, or 3, there's a cool binary trick for approximating pow calculations.  If its 2, I'd really like to know, if its 3, its still cool, if its 1, well, at least then I could make total sense of the isa code I see  

              As I said though, now, the code I added isn't having a significant impact on performance, and since time is always tight, I'll have to leave it alone for at least a little while...