5 Replies Latest reply on Feb 29, 2012 1:01 PM by MicahVillmow

    SDK 2.6 CPU FMA support bug?


      I have an Intel CPU where clinfo says:

      IEEE754-2008 fused multiply-add:         Yes
        Name:                                      Intel(R) Xeon(R) CPU       E5430  @ 2.66GHz


      I believe this CPU does not have FMA support as no Intel CPU has afaik? Isnt this a bug?


      So what happens when this function is used in a program?

        • Re: SDK 2.6 CPU FMA support bug?

          Software emulation.

            • Re: SDK 2.6 CPU FMA support bug?

              Yes, but wouldnt that cause programs to run slower. Also, what is the point of having such information about a hardware device when everything can be emulated.


              Actually Intel's own SDK does not show FMA support on the same CPU and the same kernel runs 30%-40% faster. Perhaps due to emulation?


              I think if the OpenCL code is auto-optimized to use FMA when it does not exist, it will naturally function slower. Then you will end up people talking about AMD's OpenCL implementation is slow and should be avoided. Do you understand the problem?

                • Re: SDK 2.6 CPU FMA support bug?

                  While running slow on the CPU is a problem for some customers, AMD's main focus is on GPU performance. Intel has to focus on their CPU platform to run as fast as possible, otherwise no-one would use it as many of the same algorithms can run faster on GPU's or accelerators. While we are working on improving CPU performance, it isn't as high a priority for us as it would be for our competitor.

                    • Re: SDK 2.6 CPU FMA support bug?

                      I dont understand your thinking here. Because:


                      1- Putting an emulation feature takes more effort than simply not putting it. Therefore AMD could divert more time to OpenCL on GPU products. It is good that there is emulation (and Intel's SDK does that too as far as I can tell), but it is bad that the actual device is shown like it supports FMA natively.


                      2- OpenCL says:


                      Describes double precision floating-point capability of the OpenCL device.

                      for CL_DEVICE_DOUBLE_FP_CONFIG ... It is simply not a capability of the OpenCL device if it is emulated. Honestly, it is confusing...


                      For example, if I have code with fma() function, it still works with Intel SDK. But Intel SDK correctly reports the hardware capabilties. So I wont expect miracles because I think the device supports something which in reality it doesnt.



                      What if I made code which was using FMA but also using different optimizations for architectures which does not support it? and execute different kernel based on the information returned? When I query the device features, I will end up with "wrong" information...


                      I guess there is a reason why Intel decided to not show fake information...


                      3- AMD is also producing processors with many cores. I would have thought they would at least want to put a little bit more effort on getting things right.



                      In either case, I am just trying to point out why it is wrong to give this information, I try not to argue But this is in my opinion a bug...