I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64).
I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell me that "it hard to believe that the second kernel really has an optimum count of 2".
my questions are :
1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?
2.dose global_work_size paramether in clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?
3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?
4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?