You are right, I tried this on a machine with Catalyst 14.2 and kaveri processor and it also could not generate FMA instructions in the assembly for the CPU while the GPU code had it.
I see in clinfo for CPU:
Driver version: 1411.4 (sse2,avx,fma4)
Just for reference, here is the test code I used:
__kernel void myfma(__global float *a, __global float *b, __global float *c, __global float *d)
d[i] = fma(a[i],b[i],c[i]);
Neither does the Intel OpenCL generate FMA instructions for Haswell.
My belief is that the CPU FMA instructions don't conform to the IEEE 2008 floating point standards and hence the opencl standard.
moozoo, you seem to be incorrect. Intel OpenCL generates FMA instruction... I see the following assembly code piece when I compile with Intel (OpenCL 1.2 (Build 56860))
vmovups (%rsi,%rbx,4), %xmm0
vmovups (%r10,%rbx,4), %xmm2
vmovups (%rdx,%rbx,4), %xmm1
vfmadd132ps %xmm0, %xmm2, %xmm1
vmovups %xmm1, (%r9,%rbx,4)