Hello, newbie here.
I want to do a dot product like in this tuto: http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-c...
http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-c...
My operation is more like that:
float reduce_sum(double* inputA, double* inputB , int length) {
float accumulator = 0.0;
for(int i = 0; i < length; i++)
accumulator += inputA*inputB;
return accumulator;
}
But the tutorial is far too complex for now, and here my current kernel:
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable
void MyAtomicAdd(__global double *val, double delta) {
union {double f;ulong i;} old;
union {double f;ulong i;} new;
do {
old.f = *val;
new.f = old.f + delta;
} while (atom_cmpxchg ( (volatile __global ulong *)val, old.i, new.i) != old.i);
}
__kernel void inner(__global double *out, __global double *x, __global double *p)
{
int gid = get_global_id(0);
MyAtomicAdd(out, x[gid]*p[gid]);
}
It's working but it's very very very slow (2second). So I thought I could use:
atom_add((volatile __global ulong *)out, (ulong) (x[gid]*p[gid])) instead of MyAtomicAdd but the output is not good so I think even if ulong and double have the same size it's not enough.Also if someone could tell me how to make an atomic add with double, I would be very grateful because google is not very helping in this case.
Can you help me ?
Thank you for any advice.