can i use stream as input while calling DGEMM? I have already passed the matrix to GPU's local memory, then how can i call dgemm without passing the data again?
The current DGEMM api in ACML-GPU is identical to the BLAS DGEMM api. As such, it accepts pointers to the host memory. The idea is that you can substitute any BLAS library with ACML-GPU without any code modifications.
I understand the situation that you described, where the input matrices are already on the GPU. I will pass on the feature request.