
Reduction kernel error
josopait Aug 7, 2008 7:03 PM (in response to traits)I get the same error with the following test program:
reduce void sum( float4 a<>, reduce float4 b<> )
{
b += a;
}
int main()
{
float4 a<76>;
float4 b;
sum(a, b);return 0;
}It only fails if the size of a is 76. With most other sizes that I have tried the error disappears.

Reduction kernel error
lpw Aug 8, 2008 1:21 PM (in response to josopait)Originally posted by: josopaitIt only fails if the size of a is 76. With most other sizes that I have tried the error disappears.
The prime factorization of 76 is 2*2*19.
19 is bad news (see post above).


Reduction kernel error
lpw Aug 8, 2008 1:18 PM (in response to traits)Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).
Why?
As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.
This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.

Reduction kernel error
traits Aug 9, 2008 1:09 AM (in response to lpw)Originally posted by: lpw
Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).
Why?
As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.
This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.
why is it failed with 10? 10=2*5.
