3 Replies Latest reply on Sep 18, 2009 4:22 PM by gaurav.garg

    Problems with Brook+ code only if BRT_RUNTIME=cal


      I'm experiencing some problems with a Brook+ code (a GMRES solver for sparse matrix), which runs perfectly with BRT_RUNTIME=cpu, but which gives totally different results if BRT_RUNTIME is set to cpu.

      I'm using double precision, sparse matrix-vector multiplications and dense matrix-matrix multiplications (with parts of code taken from Brook+ legacy samples "sparse_matrix_vector" and "double_precision_simple_matmult").

      The matrix-matrix multiplication's gather input parameters and the output parameters are often subsets of bigger matrices. To select the sub-matrices, I'm using the "domain()" operator.

      I thought that the problem could be related to the domain() operator together with a gather indexing, but I had fine results while running simple tests of matrix multiplication with input and output parameters selected with domain() operator.

      I also found this post http://forums.amd.com/devforum...id=97566&enterthread=y , but in my case gather input parameters are indexed with double [ ] [ ].

      Does anyone have any idea of what could be a possible source of errors?




        • Problems with Brook+ code only if BRT_RUNTIME=cal

          Have you checked error and errorlog on your streams?

            • Problems with Brook+ code only if BRT_RUNTIME=cal

              Ok, I found out the cause of my problems.

              To multiplicate a vector for a scalar, let us say y = a*x, I was running a simple function like this:

                  kernel void stream_mult ( double instream1<>, double instream2<>, double outstream<> )
                       outstream = instream1*instream2;

              calling then from the main body :  stream_mult ( xstream, astream, ystream );


              This worked fine with cpu, but not with BRT_RUNTIME=cal.

              To make the function run correctly on GPU, the scalar coefficient a must be passed as a gather parameter.

              Here is the correct way:

                  kernel void stream_scalar_mult ( double instream<>, double coef[], double outstream<> )

                      outstream = instream*coef[0];

              calling in the main body: stream_scalar_mult ( xstream, astream, ystream ) ;


              I don't understand exactly why should it be incorrect to pass the scalar coefficient as a normal input stream, as it is of dimension 1 and it should be compared with no problems with streams of all dimension (there is no problem of multiplicity of dimensions! ).

              Thank you