Archives Discussions

MatAle · ‎08-10-2011

Hello.

I have a small problem with the accuracy of calculations. The kernel (code below) returns different results for the same input data. The difference is minimal.

For example (the program's output):

GRMS: 70.764435 (first run), 70.583441 (second run), 70.563168, ...

The results are stored in dx[N], dy[N], dz[N] and R[N]. GRMS is average value of dx[N], dy[N], dz[N].

X[N], Y[N] and Z[N] are coordinates and C[N] is for charge (input data) - atoms.

Does anyone have an idea what could be wrong? Why such a small differenc in calculations?

__kernel void vector_add(__global float *X, __global float *Y, __global float *Z, __global float *C, __global float *R , const int limit, __global float *dx , __global float *dy , __global float *dz) { int i = get_global_id(0); int n=0; float distance=0.0; float e, tx, ty, tz,stx,sty,stz; float cutoff = 10.0; float cutx, f, cg, tf, rr; R=0; dx=0; dy=0; dz=0; dx[i+1]=0; dy[i+1]=0; dz[i+1]=0; cutx = cutoff*cutoff; for(n=i+1; n<limit; n++){ stx=0.0,sty=0.0,stz=0.0; distance=sqrt((pow(X-X,2)+pow(Y-Y,2)+pow(Z-Z,2))); if (distance <= cutoff ){ rr=distance*distance; f=1.0-rr/cutx; cg=(C*C)/distance; e=cg*f*f; R+=e; tf = -(e/rr)-(4.0*cg*f)/cutx; tx = tf*(X-X); ty = tf*(Y-Y); tz = tf*(Z-Z); dx = dx + tx; dy = dy + ty; dz = dz + tz; stx = stx + tx; sty = sty + ty; stz = stz + tz; } dx = dx - stx; dy = dy - sty; dz = dz - stz; } }

nou · ‎08-10-2011

why do write into dx[i+1] = 0 ...

maximmoroz · ‎08-10-2011

It is a design problem.

Work-item of global id i reads and writes buffers dx, dy and dz for indexes [i, limit]. Without any synchronization. It is a complete mess.

MatAle · ‎08-11-2011

Thanks. I try with barriers (CLK_GLOBAL_MEM_FENCE) but it does not help. Do you have any advice what can I use besides the variable "i" in order to synchronize the implementation?

rick_weber · ‎08-11-2011

You cannot synchronize between work groups. Synchronization only happens among work items within a work group. You need to redesign your algorithm so that it doesn't need synchronization and so that work items can execute in any order and you still get correct results.

maximmoroz · ‎08-11-2011

Parallelize the algorithm. Make each work-item independent on any another.

Archives Discussions

Synchronization problem??