cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

musale87
Adept I

Maths in double precision gives me different results on AMD and NVIDIA

Hi everyone,

I made a porting a Cellular Automaton from CPU to GPU, I'm using OpenCL and I have the following hardware:

CPU Xeon 8 core

GPU NVIDIA k20c

GPU AMD R9 280x

The porting is completed but there is a problem with the R9 280x card.

The problem is that the result are very different from the CPU's and NVIDIA's (this two are very close).

I have made a simple kernel to show you the different results:


#pragma OPENCL EXTENSION cl_khr_fp64 : enable


#define DIVISOR 1.0002349474678



__kernel void div_kernel( __global double *in){


    in[0]+=pow(DIVISOR,10);


    barrier(CLK_GLOBAL_MEM_FENCE);


}



__kernel void div_kernel2( __global double *in){


    in[0]/=pow(DIVISOR,10);


    barrier(CLK_GLOBAL_MEM_FENCE);


}



__kernel void div_kernel3( __global double *in){


    in[0]=in[0]/DIVISOR;


    barrier(CLK_GLOBAL_MEM_FENCE);


}



And those are the results after 1000 execution:

OpenCL on CPU:

div_kernel = 101002.351960

div_kernel2 = 9544.561027

div_kernel3 = 79063.419905

OpenCL on NVIDIA:

div_kernel = 101002.351960

div_kernel2 = 9544.561027

div_kernel3 = 79063.419905

OpenCL on AMD:

div_kernel = 101007.072489

div_kernel2 = 86.949658

div_kernel3 = 79063.419905

as you can see, the division seems ok, but the pow function has rounding error which used in combination with division, amplify the error..

In attachment there is the code if anyone would try it self this kernel (tested on linux and mac).


My question is, why there are this different result from NVIDIA to AMD?

Thanks to all in advance

EDIT:

If anyone have an AMD card and would help me to find a solution, can simply compile and execute the code and then report the results

Thanks!

PS. I'm using Ubuntu 14.04 x86_64 with Catalyst-14.9.

0 Likes
11 Replies
yurtesen
Miniboss

Re: Maths in double precision gives me different results on AMD and NVIDIA

Don't you think that there would be race conditions when multiple work[items/groups] write to same memory location simultaneously? Try to use atomic add for the div_kernel and see if it gives same results on all devices:

atom_add

0 Likes
bilal
Adept I

Re: Maths in double precision gives me different results on AMD and NVIDIA

Sorry i didn't understand your problem correctly. But the built in pow function often gave me very different result and thats why i avoid it by either explicitly coding it (e.g  double myPow(arg,arg)) or as a kernel argument where power value is computed on CPU.

Your attached code gave me the following result.. Moreover inside your code you set in the NDRange global and local sizes  both to 1, which wouldn't benefit you in getting performance.

Group 1 with 1 devices

Device 0

Platform name: NVIDIA CUDA

Vendor: NVIDIA Corporation

Device name: GeForce GTX 260

Bit Arch: 32

Extension: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64

Execution successful in 18208 millisec

result_k1: 101002.351960

result_k2: 9544.561027

result_k3: 79063.419905

0 Likes
musale87
Adept I

Re: Maths in double precision gives me different results on AMD and NVIDIA

Hi yurtesen,

help me to understand why there is a race condition if the global_size and the local_size are 1?

Thanks.

Bwt using this modified kernel gives same results:


__kernel void div_kernel2( __global double *in) {


    int pos = get_global_id(0);


    in[pos]/=pow(DIVISOR,10);


    barrier(CLK_GLOBAL_MEM_FENCE);


}


0 Likes
musale87
Adept I

Re: Maths in double precision gives me different results on AMD and NVIDIA

Hi bilal,

this code is just an example to highlights the problem

I know that using NDRange with size 1 is not a good choice, but I put 1 just to have only one workitem

Your results are correct because you are using an NVIDIA  

0 Likes
musale87
Adept I

Re: Maths in double precision gives me different results on AMD and NVIDIA

The solution (suggested by an user of GPGPU group on linkedIn) was to made my own pow() function like this:


double my_pow(double x, double a) {


    return exp(x * log(a));


}


It works very well and the results are the same as nVIDIA ones

0 Likes
bilal
Adept I

Re: Maths in double precision gives me different results on AMD and NVIDIA

Yes that's what i pointed out in my reply.

mdriftmeyer
Adept II

Re: Maths in double precision gives me different results on AMD and NVIDIA



musale87 wrote:



The solution (suggested by an user of GPGPU group on linkedIn) was to made my own pow() function like this:




  1. double my_pow(double x, double a) { 
  2.     return exp(x * log(a)); 






It works very well and the results are the same as nVIDIA ones


A far cleaner, simpler and more debug able solution.

yurtesen
Miniboss

Re: Re: Maths in double precision gives me different results on AMD and NVIDIA

Well sorry, it was my mistake, I didn't check your code attachment, so I didnt realize you enqueue the kernels 1000 times etc.

Now I downloaded your kernel and ran it on Spectre, Hawaii, AMD A10-7850K with AMD SDK and Intel SDK. The machine was using linux-amd-catalyst-14.6-beta-v1.0-jul11.zip driver.

I have attached the output from your program. So there is no problem?

musale87
Adept I

Re: Re: Re: Maths in double precision gives me different results on AMD and NVIDIA

Thanks yurtesen!

your result is good!

I'm seriously thinking that I have a problem in configuration or hw, may be a problem with the driver which goes in conflict with nvidia ones or problem with power supply?!

Can you make another try  please? Replace pow(DIVISOR,10) with cos(DIVISOR) in the first kernel function and run it.

In my system this is the result on AMD:

result_k1: 99398.981831

but this is the right result:

result_k1: 100540.104589

0 Likes