Archives Discussions

divij · ‎06-17-2011

Optimizing sgemm implementation using openCL

Hey,

I want to code an openCL routine for sgemm and want to optimize it.

I have read the following threads related to this:

http://forum.beyond3d.com/showthread.php?t=54842

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127963&STARTPAGE=1&FTVAR_FORUMVIEWTMP=Linear

To optimise their code people have adopted:

1. Blocking

2. Using texture cache instead of LDS

3. Using register files

They have used CAL ,Brook++ & IL to program the kernel. However, CAL is soon going to be deprecated in favour of openCL.

My question is:

How do I optimize my matrix multiply code using openCL on cayman?

I have already implemented tiling and computing in the local storage. But the results are very bad.

himanshu_gautam · ‎06-19-2011

There is one kernel in SDK samples for matrixmultiplication which mihgt be helpful to you. Also sgemm & dgemm are already available in library clamdblas, so you can use that directly 🙂

Archives Discussions

GEMM implmentation using openCL