3 Replies Latest reply on Oct 28, 2009 12:30 PM by gaurav.garg

    scatter + domainSize problems

    hennequi

      Good morning,

      I'm having problems with a simple scatter kernel for which I want to restrict the domainSize.

      Here is a simple kernel which does what I want :

      kernel void kernel_one(unsigned int m, float4 a<>, out float b[][]){
        int k = instance().x;
        unsigned int t;
        for(t=0; t<m; t++){
          b[4*k+0][t] = a.x + (float)t;
          b[4*k+1][t] = a.y + (float)t;
          b[4*k+2][t] = a.z + (float)t;
          b[4*k+3][t] = a.w + (float)t;
        };
      }

      it takes an input stream a of size n/4 float4s, and put it in a big matrix of size n,m, such that column t contains ((vector a) + t).

      Prior to calling the kernel, I always set the domainSize to n/4 (ie size of the input stream).

      This works fine.

      Now, if I add another dummy normal output stream float4 c<>, things get bad:

      kernel void kernel_two(unsigned int m, float4 a<>, out float4 c<>, out float b[][]){
        int k = instance().x;
        unsigned int t;
        for(t=0; t<m; t++){
          b[4*k+0][t] = a.x + (float)t;
          b[4*k+1][t] = a.y + (float)t;
          b[4*k+2][t] = a.z + (float)t;
          b[4*k+3][t] = a.w + (float)t;
        };
        c = a;
      }

      (c does nothing but copying a)

      now, for n = 256, m=5, the result is correct :

      0   1   2   3   4

      1   2   3   4   5

      2   3   4   5   6

      ......

      255 256 257 258 259

      but for n = 260, m=5, it breaks down :

      0   1   2   3   4

      0   0   0   0   0

      1   2   3   4   5

      0   0   0   0   0

      .......

      129 130 131 132 133

      0   0   0   0   0

       

      ??

      In fact, the first kernel works fine on GPU, but when I switch to the CPU backend, I get a segfault, which gdb backtracks there:

      #0  0x00007f111651f372 in brt::CPUKernel::Map () from /opt/atibrook/sdk/lib/libbrook.so

      So, shall I conclude that having one scatter output + one normal output stream is implicitely not allowed?

      Thanks for the help,

      Guillaume