6 Replies Latest reply on Apr 9, 2010 10:33 AM by notyou

    Slow Computation/Crashing Driver


      Hello everyone,

      I've been working on advancing my matrix multiplication code and I now have the algorithm working 100% on my OpenCL CPU device. The issue I'm having is that the computation is very slow when compared to the provided MM sample or even host based computation. Keep in mind that I am still very new to OpenCL, so any tips to increase my application's performance are welcome.

      Also, it seems that when moving to the GPU device, once I hit a size of 256x256, the driver will crash or at least seem to hang for a very long time. This leads to an execution time greater than both the host CPU doing the calculation and the OpenCL CPU device doing the calculation.

      __kernel void globalMM(__global int *A, __global int *B, __global int *C, int dimensions, int block_size) { int group_id0 = get_group_id(0); int group_id1 = get_group_id(1); int local_id0 = get_local_id(0); int local_id1 = get_local_id(1); int row = (group_id0 * block_size) + local_id0; int col = (group_id1 * block_size) + local_id1; for(int k = 0; k < dimensions; k++) { C[row * dimensions + col] = 0; for(int j = 0; j < dimensions; j++) C[row * dimensions + col] += A[row * dimensions + j] * B[j * dimensions + col]; } }