# Matrix multiplication ok this way?

Question asked by drknoerig77 on Sep 22, 2013
Latest reply on Sep 25, 2013 by himanshu.gautam

I wrote a small matrix multiplication kernel for self-educational purposes:

/**

* @brief Matrix multiplication kernel.

* @param A The matrix as concencated row vectors.

* @param x Multiplicant vector (size: number of columns in A)

* @param y Result vector A*x (size: number of rows in A)

* @param nRows Number of rows in the matrix.

* @param nCols Number of columns in the matrix.

* @param xbuffer Local memory buffer for the multiplicant vector.

*/

__kernel void matrixMultiplication(__global double *A,__global double *x,__global double *y,unsigned int nRows,unsigned int nCols,__local double *xbuffer,__local double *resultBuffer)

{

int column=get_local_id(0); // equals the column number

int row=get_group_id(0); // fetch the workgroup number - should be the row number

int globalID=get_global_id(0);

xbuffer[column]=x[column];

barrier(CLK_LOCAL_MEM_FENCE);

resultBuffer[column]=xbuffer[column]*A[globalID];

barrier(CLK_LOCAL_MEM_FENCE);

double sum=0;

if(column==0)

{

for(int k=0;k<nCols;k++) sum += resultBuffer[k];

y[row]=sum;

}

}

I call the kernel this way:

cl::NDRange globalSize(A.getNumberOfRows()*A.getNumberOfColumns());
cl::NDRange localSize(A.getNumberOfColumns());
rcpp::KernelFunctor vektorSquareFunc(matrixMultiplicationKernel,queue,globalSize,localSize);
vektorSquareFunc(cl::Buffer(A),
cl::Buffer(x),
cl::Buffer(y),
A.getNumberOfRows(),
A.getNumberOfColumns(),
rcpp::KernelFunctor::createLocalMemoryBuffer<double>(A.getNumberOfColumns()),
rcpp::KernelFunctor::createLocalMemoryBuffer<double>(A.getNumberOfColumns()));

It works basically.

My questions are:

1. Is it ok to set the matrix column size as local/workgroup size argument?

2. Is the usage of the memory fences and the local memory ok?

3. How to handle the sum-up of the row-vector/vector product in a better way?