3 Replies Latest reply on Mar 11, 2014 2:15 AM by yurtesen

    CPU OpenCL: No fma on Piledriver?

    rahulgarg

      I am running an OpenCL kernel on Piledriver CPUs (particularly A10-5750M Richland APU) using AMD's OpenCL implementation. Piledriver CPUs support FMA3 and FMA4 operations and I expected that if I use "fma" builtin in OpenCL, it will generate corresponding hardware instruction. Instead I discovered that the performance is terrible and then discovered that instead of generating a single instruction, a function call is being generated for a software implementation of FMA. 

       

      Any idea why FMA in OpenCL does not generate FMA hardware instructions? Tested on OpenSUSE 13.1 64-bit using Catalyst 13.12 and also tested on Windows 8.1 64-bit using Catalyst 13.12 using the same hardware. clinfo reports the following for the driver version:    "Driver version:  1348.5 (sse2,avx,fma4)" so clearly the OpenCL runtime is detecting presence of FMA instructions.

        • Re: CPU OpenCL: No fma on Piledriver?
          yurtesen

          You are right, I tried this on a machine with Catalyst 14.2 and kaveri processor and it also could not generate FMA instructions in the assembly for the CPU while the GPU code had it.

           

          I see in clinfo for CPU:

            Driver version:                          1411.4 (sse2,avx,fma4)

           

          Just for reference, here is the test code I used:

           

          __kernel void myfma(__global float *a, __global float *b, __global float *c, __global float *d)

          {

              int i=get_global_id(0);

              d[i] = fma(a[i],b[i],c[i]);

          }

          • Re: CPU OpenCL: No fma on Piledriver?
            moozoo

            AVX2 and FMA3 support

            Neither does the Intel OpenCL generate FMA instructions for Haswell.

            My belief is that the CPU FMA instructions don't conform to the IEEE 2008 floating point standards and hence the opencl standard.

              • Re: CPU OpenCL: No fma on Piledriver?
                yurtesen

                moozoo, you seem to be incorrect. Intel OpenCL generates FMA instruction... I see the following assembly code piece when I compile with Intel (OpenCL 1.2 (Build 56860))

                 

                vmovups    (%rsi,%rbx,4), %xmm0

                    vmovups    (%r10,%rbx,4), %xmm2

                    vmovups    (%rdx,%rbx,4), %xmm1

                    vfmadd132ps    %xmm0, %xmm2, %xmm1

                    vmovups    %xmm1, (%r9,%rbx,4)