tgm@ncic.ac.cn

performance on gather&scatter

Discussion created by tgm@ncic.ac.cn on Feb 26, 2009
Latest reply on Apr 10, 2009 by arros123

The flowing MD function uses both gather and scatter. I found that the performance on HD4870 is extremely poor. The keyboard/mouse even is inactive for several seconds. Why?

In one test case:   N_vec4=int4(10000,2500,96,100000),ng_vec4=int4(5,5,17,0),  the size of stream pos<> and tag[] is 10000, size of stream bucket[] is 40800, size of ne[] is 26, size of nnlist is 25920000

kernel void streamNeigh(
    int4 N_vec4, int4 ng_vec4,
    float4 pos<>,
    int4 tag[], int bucket[], int4 ne[],
    out int nnlist[]
    )
{
  int i, j, k;
  int ix, iy, iz;
  int id;
  int x, y, z;
  int a, na, o;
  int pnt, boff;
  int ind = instance().x;
  int stride = N_vec4.y+1;
  int offset = ind * stride;
  float4 p = pos;

  pnt = 0;
  a = tag[ind].y;
  iy = a%ng_vec4.y;
  ix = (a/ng_vec4.y)%ng_vec4.x;
  iz = a/ng_vec4.x/ng_vec4.y;
  k = tag[ind].z;
  boff = a*N_vec4;
  if (k < N_vec4.z && tag[ind].x != -1) {
    for (j = 0; j < N_vec4.z; j+=1) {
      if (j != k && (id = bucket[boff+j]) != -1) {
        if (tag[id].x != -1) {
          nnlist[offset+1+pnt] = id;
          pnt+=1;
        }
      }
    }
    for (j = 0; j < 26; j+=1) {
      x = ix + ne[j].x;
      y = iy + ne[j].y;
      z = iz + ne[j].z;
      na = (z+ng_vec4.z)%ng_vec4.z*ng_vec4.x*ng_vec4.y+(x+ng_vec4.x)%ng_vec4.x*ng_vec4.y+(y+ng_vec4.y)%ng_vec4.y;
      boff = na * N_vec4.z;
      for (o = 0; o < N_vec4.z; o+=1) {
        if ((id = bucket[boff+o]) != -1) {
          if (tag[id].x != -1) {
            nnlist[offset+1+pnt] = id;
            pnt+=1;
          }
        }
      }
    }
  }
  nnlist[offset] = pnt;
}

Outcomes