I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64).

I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and **LeeHowes** tell me that "it hard to believe that the second kernel really has an optimum count of 2".

my questions are :

1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?

2.dose global_work_size paramether in clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?

3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?

4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?

Yes you are right

global_work_size does say about only work-items but not about wavefronts.

One thing you need to remember is this is a optimization hint for ATI gpu's. You are able run kernel even if you have global work size is not multiples of 64.

No need to be multiple of 64. wavefront is size differnet for different devices. wavefront sizes are 16, 32 and 64 for ATI GPU's