0 Replies Latest reply on Jul 31, 2013 9:55 AM by matusi143

    Suggested implementation for Sparse Matrix Inversion and Multiplication

    matusi143

      GPU Compute Programmers,

       

      I have a C++ program which currently relies on the ACML (LAPACK) to invert and multiple fairly large matrices of single precision fp values (E.g. 4,000 x 4,000).  These matrices are very sparse although they do not always fit nicely into a diagonal matrix so I cannot presently reduce them.  The other thing about this program is I have to do this invert and multiply several times (serially) as part of a Newton Rapson.  However, I have several thousand permutations which can be done in parallel, each with a small change to the matrix before again calculating and inverting the Jacobian.  This is all single precision fp, and seems perfectly suited for the GPU.  My question is this...

       

      I suspect I will need to use the Accelerated Parallel Processing Math Libraries (APPML) as that is the only thing available with BLAS functionality, although I do not see the LAPACK dgetrf_ and dgetri_ functions included in APPML (yes, these are fp64 but I don't need that precision).   Would C++ AMP be a better alternative?  I am very interested in HSA features of passing pointers rather than copying data as there is a lot of data in flight here and some calculations still are done on the CPU.  Ultimately, performance is the key and I want to make the right architectural decisions to set myself up for the most performance I can wring out of HSA GPUs coming out over the next 6 months.

       

      Any thoughts, additional questions or discussion would be greatly appreciated!