Confused about global/local id

Discussion created by notyou on Mar 27, 2010
Latest reply on Mar 31, 2010 by notyou

I'm trying to perform matrix multiplication and I have the code correctly running on the CPU when the arrays are flattened.

When I run it on the GPU however, it only works for every 16th array index, which if I'm not mistaken, means it's not running it on every local_id thread.

Can someone walk through my thought process below and let me know where I'm wrong.

I am testing with an int array of size 256 (16x16). The max work group size is 256. Here, when I use row (= get_global_id(0)), it comes back with 0-15 => so it's creating 16 work groups with 16 threads each. Why doesn't it create one work group with 256 threads?

Then, when I use col (= get_local_id(0)), it only gets the first thread's id and so it only runs for the single column. Can someone explain to me what exactly I'm doing wrong, and why it's not getting the local_id for every thread? Thanks.


__kernel void global_MM(__global int *A, __global int *B, __global int *C, int dimensions) { int value = 0; int row = get_global_id(0); int col = get_local_id(0); for(int i = 0; i < dimensions; i++) value += A[row * dimensions + i] * B[i * dimensions + col]; C[row * dimensions + col] = value; }