Originally posted by: Fuxianjun I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64).
I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell me that "it hard to believe that the second kernel really has an optimum count of 2".
my questions are :
1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?
Yes you are right
2.dose global_work_size paramether in clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?
global_work_size does say about only work-items but not about wavefronts.
3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?
One thing you need to remember is this is a optimization hint for ATI gpu's. You are able run kernel even if you have global work size is not multiples of 64.
4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?
No need to be multiple of 64. wavefront is size differnet for different devices. wavefront sizes are 16, 32 and 64 for ATI GPU's