AnsweredAssumed Answered

Matrix multiplication ok this way?

Question asked by drknoerig77 on Sep 22, 2013
Latest reply on Sep 25, 2013 by himanshu.gautam

I wrote a small matrix multiplication kernel for self-educational purposes:

 

/**

* @brief Matrix multiplication kernel.

* @param A The matrix as concencated row vectors.

* @param x Multiplicant vector (size: number of columns in A)

* @param y Result vector A*x (size: number of rows in A)

* @param nRows Number of rows in the matrix.

* @param nCols Number of columns in the matrix.

* @param xbuffer Local memory buffer for the multiplicant vector.

*/

 

__kernel void matrixMultiplication(__global double *A,__global double *x,__global double *y,unsigned int nRows,unsigned int nCols,__local double *xbuffer,__local double *resultBuffer)

{

    int column=get_local_id(0); // equals the column number

    int row=get_group_id(0); // fetch the workgroup number - should be the row number

    int globalID=get_global_id(0);

    xbuffer[column]=x[column];

    barrier(CLK_LOCAL_MEM_FENCE);

    resultBuffer[column]=xbuffer[column]*A[globalID];

    barrier(CLK_LOCAL_MEM_FENCE);

    double sum=0;

    if(column==0)

    {

        for(int k=0;k<nCols;k++) sum += resultBuffer[k];

        y[row]=sum;

    }

}

 

I call the kernel this way:

        cl::NDRange globalSize(A.getNumberOfRows()*A.getNumberOfColumns());
        cl::NDRange localSize(A.getNumberOfColumns());
        rcpp::KernelFunctor vektorSquareFunc(matrixMultiplicationKernel,queue,globalSize,localSize);
        vektorSquareFunc(cl::Buffer(A),
                         cl::Buffer(x),
                         cl::Buffer(y),
                         A.getNumberOfRows(),
                         A.getNumberOfColumns(),
                         rcpp::KernelFunctor::createLocalMemoryBuffer<double>(A.getNumberOfColumns()),
                         rcpp::KernelFunctor::createLocalMemoryBuffer<double>(A.getNumberOfColumns()));

 

It works basically.

 

My questions are:

     1. Is it ok to set the matrix column size as local/workgroup size argument?

     2. Is the usage of the memory fences and the local memory ok?

     3. How to handle the sum-up of the row-vector/vector product in a better way?

Outcomes