cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

MatAle
Journeyman III

Synchronization problem??

Hello.

I have a small problem with the accuracy of calculations. The kernel (code below) returns different results for the same input data. The difference is minimal.

For example (the program's output):

GRMS: 70.764435 (first run), 70.583441 (second run), 70.563168, ...

The results are stored in dx[N], dy[N], dz[N] and R[N]. GRMS is average value of dx[N], dy[N], dz[N].

X[N], Y[N] and Z[N] are coordinates and C[N] is for charge (input data) - atoms.

Does anyone have an idea what could be wrong? Why such a small differenc in calculations?

 

 

__kernel void vector_add(__global float *X, __global float *Y, __global float *Z, __global float *C, __global float *R , const int limit, __global float *dx , __global float *dy , __global float *dz) { int i = get_global_id(0); int n=0; float distance=0.0; float e, tx, ty, tz,stx,sty,stz; float cutoff = 10.0; float cutx, f, cg, tf, rr; R=0; dx=0; dy=0; dz=0; dx[i+1]=0; dy[i+1]=0; dz[i+1]=0; cutx = cutoff*cutoff; for(n=i+1; n<limit; n++){ stx=0.0,sty=0.0,stz=0.0; distance=sqrt((pow(X-X,2)+pow(Y-Y,2)+pow(Z-Z,2))); if (distance <= cutoff ){ rr=distance*distance; f=1.0-rr/cutx; cg=(C*C)/distance; e=cg*f*f; R+=e; tf = -(e/rr)-(4.0*cg*f)/cutx; tx = tf*(X-X); ty = tf*(Y-Y); tz = tf*(Z-Z); dx = dx + tx; dy = dy + ty; dz = dz + tz; stx = stx + tx; sty = sty + ty; stz = stz + tz; } dx = dx - stx; dy = dy - sty; dz = dz - stz; } }

0 Likes
5 Replies
nou
Exemplar

why do write into dx[i+1] = 0 ...

0 Likes
maximmoroz
Journeyman III

It is a design problem.

Work-item of global id i reads and writes buffers dx, dy and dz for indexes [i, limit]. Without any synchronization. It is a complete mess.

0 Likes

Thanks. I try with barriers (CLK_GLOBAL_MEM_FENCE) but it does not help. Do you have any advice what can I use besides the variable "i" in order to synchronize the implementation?

0 Likes

You cannot synchronize between work groups. Synchronization only happens among work items within a work group. You need to redesign your algorithm so that it doesn't need synchronization and so that work items can execute in any order and you still get correct results.

0 Likes

Parallelize the algorithm. Make each work-item independent on any another.

0 Likes