Slow Computation/Crashing Driver

Discussion created by notyou on Apr 5, 2010
Latest reply on Apr 9, 2010 by notyou

Hello everyone,

I've been working on advancing my matrix multiplication code and I now have the algorithm working 100% on my OpenCL CPU device. The issue I'm having is that the computation is very slow when compared to the provided MM sample or even host based computation. Keep in mind that I am still very new to OpenCL, so any tips to increase my application's performance are welcome.

Also, it seems that when moving to the GPU device, once I hit a size of 256x256, the driver will crash or at least seem to hang for a very long time. This leads to an execution time greater than both the host CPU doing the calculation and the OpenCL CPU device doing the calculation.

__kernel void globalMM(__global int *A, __global int *B, __global int *C, int dimensions, int block_size) { int group_id0 = get_group_id(0); int group_id1 = get_group_id(1); int local_id0 = get_local_id(0); int local_id1 = get_local_id(1); int row = (group_id0 * block_size) + local_id0; int col = (group_id1 * block_size) + local_id1; for(int k = 0; k < dimensions; k++) { C[row * dimensions + col] = 0; for(int j = 0; j < dimensions; j++) C[row * dimensions + col] += A[row * dimensions + j] * B[j * dimensions + col]; } }