AnsweredAssumed Answered

CPU and GPU outputs dont match even for a single threaded GPU execution

Question asked by thejascr on Nov 3, 2012
Latest reply on Nov 4, 2012 by thejascr

I am trying to run the following piece of LUDecomposition code in an OpenCL kernel.

'A' below is a single precision floating point array.


    for( k = 0; k < N; k++ )


       for(j=k+1; j < N; j++)    {

          A[k * N + j] = A[k * N + j] / A[k * N + k];


       for(i=k+1; i < N; i++)    {

        for (j=k+1; j < N; j++)   {

          A[i * N + j] = A[i * N + j] - (A[i * N + k] * A[k * N + j]);






I am running this code on the GPU on just a single GPU thread (completely sequential). So I have the global thread and local thread mapping for the kernel as follows.


    globalthread[0] = 1;

    globalthread[1] = 1;

    localthread[0] = 1;

    localthread[1] = 1;


But when I compare the GPU output to the output of the same function run on the CPU

(directly and not as an opencl device) I am seeing that the outputs dont match.

I found this unexplainable inspite of best efforts. While trying to narrow down the problem,

I found that the problem arises from the second statement. Specifically due to the subtraction operation and when the value of A[i][j] goes negative.

I have made sure that both CPU and GPU are working on the same inputs. But such a strange behavior for such a simple computation seems weird. Can anyone help explain why the outputs might be differing?

I also ran it with both AMD Device and NVIDIA device and I see the same behavior in both. (to rule

out any platform specific issue)


Here is an example output:


platform name is NVIDIA CUDA
platform version is OpenCL 1.1 CUDA 4.2.1
number of devices is 2
device name is Tesla C2050 / C2070
GPU Runtime: 0.023669s
CPU Runtime: 0.000123s
Values differ at index (45, 40): cpu_val=0.946256, gpu_val=0.963078
Values differ at index (60, 52): cpu_val=-9.348129, gpu_val=-9.483719
Values differ at index (61, 52): cpu_val=11.343384, gpu_val=11.093756
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 1.05 Percent: 3