I'm testing spmv sample in brook+. When I change NzWidth from 2 to 10, it's reporting "Failed to find usable kernel fragment to implement requested reduction". When the factor is 8, it's running correctly.
I find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).
Why?
I get the same error with the following test program:
reduce void sum( float4 a<>, reduce float4 b<> )
{
b += a;
}
int main()
{
float4 a<76>;
float4 b;
sum(a, b);
return 0;
}
It only fails if the size of a is 76. With most other sizes that I have tried the error disappears.
Originally posted by: josopaitIt only fails if the size of a is 76. With most other sizes that I have tried the error disappears.
The prime factorization of 76 is 2*2*19.
19 is bad news (see post above).
Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).
Why?
As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.
This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.
Originally posted by: lpw Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).
Why?
As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.
This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.
why is it failed with 10? 10=2*5.