cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

traits
Journeyman III

Reduction kernel error

I'm testing spmv sample in brook+. When I change NzWidth from 2 to 10, it's reporting "Failed to find usable kernel fragment to implement requested reduction". When the factor is 8, it's running correctly.

I find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

Why?

0 Likes
4 Replies
josopait
Journeyman III

I get the same error with the following test program:

 

reduce void sum( float4 a<>, reduce float4 b<> )
{
    b += a;
}


int main()
{
    float4 a<76>;
    float4 b;
    sum(a, b);

    return 0;
}

 

It only fails if the size of a is 76. With most other sizes that I have tried the error disappears.

 

0 Likes

Originally posted by: josopaitIt only fails if the size of a is 76. With most other sizes that I have tried the error disappears.


The prime factorization of 76 is 2*2*19.

19 is bad news (see post above).

0 Likes
lpw
Journeyman III

Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

 

Why?

 

As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.

This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.

0 Likes
traits
Journeyman III

Originally posted by: lpw
Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

 

Why?

 

As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.

This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.

why is it failed with 10? 10=2*5.

0 Likes