I have an OpenCL kernel that implements a dot product between two float arrays. The first is an array of size*n elements and the second is an array of n elements.
This is a sample code
void evaluate_product(__global const float *pFirstArray,
const int n,
__global const float *pSecondArray,
__global float *pOutput)
int gid = get_global_id(o);
int size = get_global_size(o);
if (gid>=0 && gid<size)
float output = 0;
for (int k=0; k<n; k++)
output += pFirstArray[gid + k*size]*pSecondArray[k];
pOutput[gid] = output;
If I execute the same operations on CPU, I have different results, above all after 6 or 7 decimal digit. Why this strange behaviour? In kronos OpenCL specification (v 1.2) they say the x+y and x*y are correctly rounded as well as IEEE 754 compliant.