11 Replies Latest reply on Oct 28, 2014 7:45 AM by musale87

    Maths in double precision gives me different results on AMD and NVIDIA

    musale87

      Hi everyone,

       

      I made a porting a Cellular Automaton from CPU to GPU, I'm using OpenCL and I have the following hardware:

      CPU Xeon 8 core

      GPU NVIDIA k20c

      GPU AMD R9 280x

       

      The porting is completed but there is a problem with the R9 280x card.

      The problem is that the result are very different from the CPU's and NVIDIA's (this two are very close).

      I have made a simple kernel to show you the different results:

       

      #pragma OPENCL EXTENSION cl_khr_fp64 : enable
      #define DIVISOR 1.0002349474678
      
      __kernel void div_kernel( __global double *in){
          in[0]+=pow(DIVISOR,10);
          barrier(CLK_GLOBAL_MEM_FENCE);
      }
      
      __kernel void div_kernel2( __global double *in){
          in[0]/=pow(DIVISOR,10);
          barrier(CLK_GLOBAL_MEM_FENCE);
      }
      
      __kernel void div_kernel3( __global double *in){
          in[0]=in[0]/DIVISOR;
          barrier(CLK_GLOBAL_MEM_FENCE);
      }
      
      

       

      And those are the results after 1000 execution:

      OpenCL on CPU:

      div_kernel = 101002.351960

      div_kernel2 = 9544.561027

      div_kernel3 = 79063.419905

       

      OpenCL on NVIDIA:

      div_kernel = 101002.351960

      div_kernel2 = 9544.561027

      div_kernel3 = 79063.419905

       

      OpenCL on AMD:

      div_kernel = 101007.072489

      div_kernel2 = 86.949658

      div_kernel3 = 79063.419905

       

      as you can see, the division seems ok, but the pow function has rounding error which used in combination with division, amplify the error..

      In attachment there is the code if anyone would try it self this kernel (tested on linux and mac).


      My question is, why there are this different result from NVIDIA to AMD?

      Thanks to all in advance

       

       

      EDIT:

      If anyone have an AMD card and would help me to find a solution, can simply compile and execute the code and then report the results

      Thanks!

       

      PS. I'm using Ubuntu 14.04 x86_64 with Catalyst-14.9.