Ok, I found out the cause of my problems.
To multiplicate a vector for a scalar, let us say y = a*x, I was running a simple function like this:
kernel void stream_mult ( double instream1<>, double instream2<>, double outstream<> )
{
outstream = instream1*instream2;
}
calling then from the main body : stream_mult ( xstream, astream, ystream );
This worked fine with cpu, but not with BRT_RUNTIME=cal.
To make the function run correctly on GPU, the scalar coefficient a must be passed as a gather parameter.
Here is the correct way:
kernel void stream_scalar_mult ( double instream<>, double coef[], double outstream<> )
{
outstream = instream*coef[0];
}
calling in the main body: stream_scalar_mult ( xstream, astream, ystream ) ;
I don't understand exactly why should it be incorrect to pass the scalar coefficient as a normal input stream, as it is of dimension 1 and it should be compared with no problems with streams of all dimension (there is no problem of multiplicity of dimensions! ).
Thank you