1 Reply Latest reply on Jun 19, 2011 7:41 PM by himanshu.gautam

    GEMM implmentation using openCL

    divij
      Optimizing sgemm implementation using openCL

      Hey,

      I want to code an openCL routine for sgemm and want to optimize it.

      I have read the following threads related to this:

      http://forum.beyond3d.com/showthread.php?t=54842

      http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127963&STARTPAGE=1&FTVAR_FORUMVIEWTMP=Linear

      To optimise their code people have adopted:

      1. Blocking

      2. Using texture cache instead of LDS

      3. Using register files

      They have used CAL ,Brook++ & IL to program the kernel. However, CAL is soon going to be deprecated in favour of openCL.

      My question is:

      How do I optimize my matrix multiply code using openCL on cayman?

      I have already implemented tiling and computing in the local storage. But the results are very bad.