3 Replies Latest reply on Sep 18, 2009 4:22 PM by gaurav.garg

    Problems with Brook+ code only if BRT_RUNTIME=cal

    lollo

      Hi,
      I'm experiencing some problems with a Brook+ code (a GMRES solver for sparse matrix), which runs perfectly with BRT_RUNTIME=cpu, but which gives totally different results if BRT_RUNTIME is set to cpu.

      I'm using double precision, sparse matrix-vector multiplications and dense matrix-matrix multiplications (with parts of code taken from Brook+ legacy samples "sparse_matrix_vector" and "double_precision_simple_matmult").

      The matrix-matrix multiplication's gather input parameters and the output parameters are often subsets of bigger matrices. To select the sub-matrices, I'm using the "domain()" operator.

      I thought that the problem could be related to the domain() operator together with a gather indexing, but I had fine results while running simple tests of matrix multiplication with input and output parameters selected with domain() operator.

      I also found this post http://forums.amd.com/devforum...id=97566&enterthread=y , but in my case gather input parameters are indexed with double [ ] [ ].

      Does anyone have any idea of what could be a possible source of errors?

      Thanks,

      Lorenzo

       

        • Problems with Brook+ code only if BRT_RUNTIME=cal
          gaurav.garg

          Have you checked error and errorlog on your streams?

            • Problems with Brook+ code only if BRT_RUNTIME=cal
              lollo

              Ok, I found out the cause of my problems.

              To multiplicate a vector for a scalar, let us say y = a*x, I was running a simple function like this:

                  kernel void stream_mult ( double instream1<>, double instream2<>, double outstream<> )
                  {
                       outstream = instream1*instream2;
                  }

              calling then from the main body :  stream_mult ( xstream, astream, ystream );

               

              This worked fine with cpu, but not with BRT_RUNTIME=cal.

              To make the function run correctly on GPU, the scalar coefficient a must be passed as a gather parameter.

              Here is the correct way:

                  kernel void stream_scalar_mult ( double instream<>, double coef[], double outstream<> )

                  {
                      outstream = instream*coef[0];
                  }

              calling in the main body: stream_scalar_mult ( xstream, astream, ystream ) ;

               

              I don't understand exactly why should it be incorrect to pass the scalar coefficient as a normal input stream, as it is of dimension 1 and it should be compared with no problems with streams of all dimension (there is no problem of multiplicity of dimensions! ).

              Thank you