2 Replies Latest reply on Aug 14, 2015 7:07 AM by dns.on.gpu

    Small Matrix Multiplication




      Most/all off-the-shelf routines for doing matrix-matrix multiplication are suitable

      for large matrices. The problems I am trying to run on a gpu (280X) involve large

      number - typically 200-300K - of relatively small (~ 40x40) matrices and they come in

      batches of 2-3K (all calculation must be done in fp64). I have written my own

      kernel for doing these using LDS and VGPRs in various combinations, but

      still, I cannot beat a 6-core cpu with omp.


      I was wondering if anyone has any info or suggestions for doing this type of problem

      on a tahiti gpu.