4 Replies Latest reply on Aug 9, 2008 1:09 AM by traits

    Reduction kernel error

    traits

      I'm testing spmv sample in brook+. When I change NzWidth from 2 to 10, it's reporting "Failed to find usable kernel fragment to implement requested reduction". When the factor is 8, it's running correctly.

      I find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

      Why?

        • Reduction kernel error
          josopait

          I get the same error with the following test program:

           

          reduce void sum( float4 a<>, reduce float4 b<> )
          {
              b += a;
          }


          int main()
          {
              float4 a<76>;
              float4 b;
              sum(a, b);

              return 0;
          }

           

          It only fails if the size of a is 76. With most other sizes that I have tried the error disappears.

           

            • Reduction kernel error
              lpw

               

              Originally posted by: josopaitIt only fails if the size of a is 76. With most other sizes that I have tried the error disappears.


              The prime factorization of 76 is 2*2*19.

              19 is bad news (see post above).

            • Reduction kernel error
              lpw

               

              Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

               

              Why?

               

              As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.

              This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.

                • Reduction kernel error
                  traits

                   

                  Originally posted by: lpw
                  Originally posted by: traitsI find this thread (http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=96153&highlight_key=y&keyword1=reduction).

                   

                  Why?

                   

                  As mentioned by udeepta in the above thread, the prime factorization of the stream size can have only 2, 3, 5 and 7 as factors.

                  This is because a kernel can take up to 8 inputs. I suspect that Brook+ does reductions by recursively partitioning the input stream into up to 8 subdomains and, with each pass, attaching the subdomains as inputs to the reduction kernel (translated to IL). If at any pass the current input stream cannot be divided into 2, 3, 4, 5, 6, 7, or 8 parts Brook+ blows chunks.

                  why is it failed with 10? 10=2*5.