Hi.
I really wonder why I don't get vectorized ACML routines when I do transcendental functions using a CPU with APP SDK?
Or have I missed something?
Update:
I of course read the 2.5 release notes just after I posted this. It says among other things:
The LLVM compiler version used for OpenCL kernels has been upgraded.
Does this mean that it autovectorizes kernels (which it ideally should do, i.e. that we don't need to use explicit vector types)?
Does it mean that transcendental routines are translated into ACML calls? (if not, it really is useless)
Best regards,
Yngve
Originally posted by: yngvesls Does this mean that it autovectorizes kernels (which it ideally should do, i.e. that we don't need to use explicit vector types)?
No autovectorization yet.
Does it mean that transcendental routines are translated into ACML calls? (if not, it really is useless)
OpenCL has no relationship with ACML. Which ACML calls are you taking?
Ok, do you know if AMD is working on autovectorization?
I know that OpenCL does not have an explicit relationship with ACML, but I just thought it was kind of stupid not to use the library in the backend when it is already developed and something AMD have released for download. It makes AMD OpenCL for CPUs quite inferior to any compiler + vector ABI combination when not using vector routines. I'm talking about e.g. exp, log and friends.
Originally posted by: yngvesl Ok, do you know if AMD is working on autovectorization?
I can't tell the schedule here?
I know that OpenCL does not have an explicit relationship with ACML, but I just thought it was kind of stupid not to use the library in the backend when it is already developed and something AMD have released for download. It makes AMD OpenCL for CPUs quite inferior to any compiler + vector ABI combination when not using vector routines. I'm talking about e.g. exp, log and friends.
Do you think OpenCL exp and log functions are slower than ACML exp and log?
They might be using same algorithm or idea in both cases but they don't want put a dependency with ACML.
OpenCL has two times of built-in functions(math function and native_* math functions). Please see spec.
"Can't tell the schedule" ? I don't understand. Are you working for AMD? If so, I of course understand 😛
Anyways, yes, I strongly believe that OpenCL transcendentals are not vectorized. Using Intel VTune I see that a call to exp uses a function called __exp_f64. According to my performance numbers this must be a 64-bit exponential implementation. Using a vector implementation should be twice as fast, atleast Intel's implementation is that.
As long as AMD OpenCL defaults to a 64-bit scalar exp approximation (without possibility to change), it is very unattractive for serious development. I've also tried with vector types with no change in performance.