I really wonder why I don't get vectorized ACML routines when I do transcendental functions using a CPU with APP SDK?
Or have I missed something?
I of course read the 2.5 release notes just after I posted this. It says among other things:
The LLVM compiler version used for OpenCL kernels has been upgraded.
- Includes support for use of SSE3 and SSE4.
- Added support for partial use of FMA4 and XOP instructions.
Does this mean that it autovectorizes kernels (which it ideally should do, i.e. that we don't need to use explicit vector types)?
Does it mean that transcendental routines are translated into ACML calls? (if not, it really is useless)