Hi,

in the matrix multiplication example provided by the SDK, there is a method called runKernelsCL(void)

int MatrixMultiplication::runKernels(void)

{

...

size_t globalThreads[2] {width1/4, height0/4};

size_t localThreads[2] = {blockSize, blockSize);

...

}

To my understanding, the program multiply 2 matrices (64x64 each), so globalThreads[0] and globalThreads[1] are 16. The blockSize = 8, so localThreads[0] and localThreads[1] are 8.

Can someone help explain what are the localThreads and globalThreads mean here?

Very sorry to bother you all here. I tried hard to understand the Execution Model in OpenCL 1.0 rev. 48 spec page 20-22, but until now have no very clear understanding how it goes. Maybe someone here can explain it with a simple word using the matrix addition or multiplication example.

Thank you

hi rolandman,

The matrix multiplication sample in the SDK is a vectorized one.So we multiply 64*64 float matrix using 16*16 float4 matrix.This technique has been used to have highly coelesced and aligned global memory reads which are generally the bottle neck in this alorithm.

You understand them right.

With regard to openCL spec can you please ask any specific questions?