cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

nberger
Adept I

Crashes of reduction kernels for certain stream lengths

Hi!
I have a simple reduction kernel, just summing up values in a stream. For certain values of the stream length, the result is expected, for others the program crashes. The first few non-working stream lengths are
11, 13, 17, 19, 22, 23, 26, 29, 31, 33...

Cheers

Nik
0 Likes
6 Replies
udeepta
Staff

Nik,

There are constraints on the stream size for a reduction kernel -- the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors. Sizes that are multiples of primes (11 or greater) will not work in reduction.

We are improving our documentation -- this will be properly noted in there.

udeepta

0 Likes
nberger
Adept I

So if this is supposed to be a "feature", is there any way to pad a stream with zeros without copying it to a new stream?
Would it be possible to do something like that automatically, as doing a prime factorization every time I create a stream seems a bit painful?
0 Likes

Nik,

Currently, the user takes the responsibility to verify that the stream size passes the above constraint when it is used in a reduction kernel. One can either test the constraint during stream creation, or copy to a padded stream just before calling reduction.

But what padded value to use depends on the user's reduction kernel. For example, if we have a reduction that calculates the minimum value {b = min(b,a);} , we might want to pad with a very large number instead of zero.

Creating a small function that checks whether there is any prime factor > 7 shouldn't be difficult -- I will create one when I find some time. 🙂

0 Likes
nberger
Adept I

Is there an easy way to do the padding without copying the stream to main memory and element wise copy to a new, long enough stream?
0 Likes
Ceq
Journeyman III

Certainly it would be handy to have a better way than copying the whole vector again...
I think some kind of substream selection command would do, for example:

float A<8>;
float subA(A, 1, 6);

SubA would 'point' to the same stream but without the first and last element, which is useful in some kernels where otherwise you'll have to write a boundary check that could slow down computation.
This way instead of copying to a new padded stream, you can use a bigger stream with valid size for reduction, and just use the selection for standard kernels.

0 Likes

Exactly. You can use the domain feature -- A.domain(1,7) is a stream from A(1) to A(6).

float A<8>;
float B<6>;
mycopy(A.domain(1,7), B);

//kernel void mycopy( float a<>, out float b<> ) { b=a;}

You do not need to do the copying for padding a stream. If you declare a "large enough" stream to begin with, you can work only on the relevant part of it (using domain) until you need to do the padding -- at which point you pass the whole stream to the kernel.

0 Likes