cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Fuxianjun
Journeyman III

A proble about kernel parameters' length

I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64). 

I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell  me that "it hard to believe that the second kernel really has an optimum count of 2".

my questions are :

1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?

2.dose global_work_size paramether in  clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?

3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?

4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?

0 Likes
1 Reply
genaganna
Journeyman III

Originally posted by: Fuxianjun I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64). 

 

I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell  me that "it hard to believe that the second kernel really has an optimum count of 2".

 

my questions are :

 

1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?

Yes you are right

 

2.dose global_work_size paramether in  clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?

global_work_size does say about only work-items but not about wavefronts.

3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?

One thing you need to remember is  this is a optimization hint for ATI gpu's.   You are able run kernel even if you have global work size is not multiples of 64.

4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?

No need to be multiple of 64.   wavefront is size differnet for different devices. wavefront sizes are 16, 32 and 64 for ATI GPU's

 

0 Likes