4 Replies Latest reply on Aug 30, 2011 10:55 AM by yngvesl

    Why on earth doesn't APP SDK utilize ACML?

    yngvesl

      Hi.

      I really wonder why I don't get vectorized ACML routines when I do transcendental functions using a CPU with APP SDK?

      Or have I missed something?

      Update:

      I of course read the 2.5 release notes just after I posted this. It says among other things:

      The LLVM compiler version used for OpenCL kernels has been upgraded. 

      • Includes support for use of SSE3 and SSE4.
      • Added support for partial use of FMA4 and XOP instructions.

       

      Does this mean that it autovectorizes kernels (which it ideally should do, i.e. that we don't need to use explicit vector types)?

      Does it mean that transcendental routines are translated into ACML calls? (if not, it really is useless)

      Best regards,

      Yngve

        • Why on earth doesn't APP SDK utilize ACML?
          genaganna

           

           

          Originally posted by: yngvesls

          Does this mean that it autovectorizes kernels (which it ideally should do, i.e. that we don't need to use explicit vector types)?



          No autovectorization yet.



          Does it mean that transcendental routines are translated into ACML calls? (if not, it really is useless)



          OpenCL has no relationship with ACML.   Which ACML calls are you taking?

            • Why on earth doesn't APP SDK utilize ACML?
              yngvesl

              Ok, do you know if AMD is working on autovectorization?

              I know that OpenCL does not have an explicit relationship with ACML, but I just thought it was kind of stupid not to use the library in the backend when it is already developed and something AMD have released for download. It makes AMD OpenCL for CPUs quite inferior to any compiler + vector ABI combination when not using vector routines. I'm talking about e.g. exp, log and friends.

               

                • Why on earth doesn't APP SDK utilize ACML?
                  genaganna

                   

                  Originally posted by: yngvesl Ok, do you know if AMD is working on autovectorization?

                  I can't tell the schedule here?

                   

                  I know that OpenCL does not have an explicit relationship with ACML, but I just thought it was kind of stupid not to use the library in the backend when it is already developed and something AMD have released for download. It makes AMD OpenCL for CPUs quite inferior to any compiler + vector ABI combination when not using vector routines. I'm talking about e.g. exp, log and friends.

                   

                   



                  Do you think OpenCL exp and log functions are slower than ACML exp and log?

                  They might be using same algorithm  or idea in both cases but they don't want put a dependency with ACML.

                  OpenCL has two times of built-in functions(math function and native_* math functions). Please see spec.

                    • Why on earth doesn't APP SDK utilize ACML?
                      yngvesl

                      "Can't tell the schedule" ? I don't understand. Are you working for AMD? If so, I of course understand :p

                      Anyways, yes, I strongly believe that OpenCL transcendentals are not vectorized. Using Intel VTune I see that a call to exp uses a function called __exp_f64. According to my performance numbers this must be a 64-bit exponential implementation. Using a vector implementation should be twice as fast, atleast Intel's implementation is that.

                      As long as AMD OpenCL defaults to a 64-bit scalar exp approximation (without possibility to change), it is very unattractive for serious development. I've also tried with vector types with no change in performance.