GEMM implmentation using openCL

Discussion created by divij on Jun 17, 2011
Latest reply on Jun 19, 2011 by himanshu.gautam
Optimizing sgemm implementation using openCL


I want to code an openCL routine for sgemm and want to optimize it.

I have read the following threads related to this:



To optimise their code people have adopted:

1. Blocking

2. Using texture cache instead of LDS

3. Using register files

They have used CAL ,Brook++ & IL to program the kernel. However, CAL is soon going to be deprecated in favour of openCL.

My question is:

How do I optimize my matrix multiply code using openCL on cayman?

I have already implemented tiling and computing in the local storage. But the results are very bad.