Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Adept II

CPU OpenCL: No fma on Piledriver?

I am running an OpenCL kernel on Piledriver CPUs (particularly A10-5750M Richland APU) using AMD's OpenCL implementation. Piledriver CPUs support FMA3 and FMA4 operations and I expected that if I use "fma" builtin in OpenCL, it will generate corresponding hardware instruction. Instead I discovered that the performance is terrible and then discovered that instead of generating a single instruction, a function call is being generated for a software implementation of FMA. 

Any idea why FMA in OpenCL does not generate FMA hardware instructions? Tested on OpenSUSE 13.1 64-bit using Catalyst 13.12 and also tested on Windows 8.1 64-bit using Catalyst 13.12 using the same hardware. clinfo reports the following for the driver version:    "Driver version:  1348.5 (sse2,avx,fma4)" so clearly the OpenCL runtime is detecting presence of FMA instructions.

3 Replies

Re: CPU OpenCL: No fma on Piledriver?

You are right, I tried this on a machine with Catalyst 14.2 and kaveri processor and it also could not generate FMA instructions in the assembly for the CPU while the GPU code had it.

I see in clinfo for CPU:

  Driver version:                          1411.4 (sse2,avx,fma4)

Just for reference, here is the test code I used:

__kernel void myfma(__global float *a, __global float *b, __global float *c, __global float *d)


    int i=get_global_id(0);

    d = fma(a,b,c);


Adept III

Re: CPU OpenCL: No fma on Piledriver?

AVX2 and FMA3 support

Neither does the Intel OpenCL generate FMA instructions for Haswell.

My belief is that the CPU FMA instructions don't conform to the IEEE 2008 floating point standards and hence the opencl standard.


Re: CPU OpenCL: No fma on Piledriver?

moozoo, you seem to be incorrect. Intel OpenCL generates FMA instruction... I see the following assembly code piece when I compile with Intel (OpenCL 1.2 (Build 56860))

vmovups    (%rsi,%rbx,4), %xmm0

    vmovups    (%r10,%rbx,4), %xmm2

    vmovups    (%rdx,%rbx,4), %xmm1

    vfmadd132ps    %xmm0, %xmm2, %xmm1

    vmovups    %xmm1, (%r9,%rbx,4)