You are right, I tried this on a machine with Catalyst 14.2 and kaveri processor and it also could not generate FMA instructions in the assembly for the CPU while the GPU code had it.
I see in clinfo for CPU:
Driver version: | 1411.4 (sse2,avx,fma4) |
Just for reference, here is the test code I used:
__kernel void myfma(__global float *a, __global float *b, __global float *c, __global float *d)
{
int i=get_global_id(0);
d = fma(a,b,c);
}