# scatter + domainSize problems

Discussion created by hennequi on Oct 28, 2009
Latest reply on Oct 28, 2009 by gaurav.garg

Good morning,

I'm having problems with a simple scatter kernel for which I want to restrict the domainSize.

Here is a simple kernel which does what I want :

kernel void kernel_one(unsigned int m, float4 a<>, out float b[][]){
int k = instance().x;
unsigned int t;
for(t=0; t<m; t++){
b[4*k+0][t] = a.x + (float)t;
b[4*k+1][t] = a.y + (float)t;
b[4*k+2][t] = a.z + (float)t;
b[4*k+3][t] = a.w + (float)t;
};
}

it takes an input stream a of size n/4 float4s, and put it in a big matrix of size n,m, such that column t contains ((vector a) + t).

Prior to calling the kernel, I always set the domainSize to n/4 (ie size of the input stream).

This works fine.

Now, if I add another dummy normal output stream float4 c<>, things get bad:

kernel void kernel_two(unsigned int m, float4 a<>, out float4 c<>, out float b[][]){
int k = instance().x;
unsigned int t;
for(t=0; t<m; t++){
b[4*k+0][t] = a.x + (float)t;
b[4*k+1][t] = a.y + (float)t;
b[4*k+2][t] = a.z + (float)t;
b[4*k+3][t] = a.w + (float)t;
};
c = a;
}

(c does nothing but copying a)

now, for n = 256, m=5, the result is correct :

0   1   2   3   4

1   2   3   4   5

2   3   4   5   6

......

255 256 257 258 259

but for n = 260, m=5, it breaks down :

0   1   2   3   4

0   0   0   0   0

1   2   3   4   5

0   0   0   0   0

.......

129 130 131 132 133

0   0   0   0   0

??

In fact, the first kernel works fine on GPU, but when I switch to the CPU backend, I get a segfault, which gdb backtracks there:

#0  0x00007f111651f372 in brt::CPUKernel::Map () from /opt/atibrook/sdk/lib/libbrook.so

So, shall I conclude that having one scatter output + one normal output stream is implicitely not allowed?

Thanks for the help,

Guillaume