Nik,
There are constraints on the stream size for a reduction kernel -- the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors. Sizes that are multiples of primes (11 or greater) will not work in reduction.
We are improving our documentation -- this will be properly noted in there.
udeepta
Nik,
Currently, the user takes the responsibility to verify that the stream size passes the above constraint when it is used in a reduction kernel. One can either test the constraint during stream creation, or copy to a padded stream just before calling reduction.
But what padded value to use depends on the user's reduction kernel. For example, if we have a reduction that calculates the minimum value {b = min(b,a);} , we might want to pad with a very large number instead of zero.
Creating a small function that checks whether there is any prime factor > 7 shouldn't be difficult -- I will create one when I find some time. 🙂
Exactly. You can use the domain feature -- A.domain(1,7) is a stream from A(1) to A(6).
float A<8>;
float B<6>;
mycopy(A.domain(1,7), B);
//kernel void mycopy( float a<>, out float b<> ) { b=a;}
You do not need to do the copying for padding a stream. If you declare a "large enough" stream to begin with, you can work only on the relevant part of it (using domain) until you need to do the padding -- at which point you pass the whole stream to the kernel.