1 Reply Latest reply on Aug 14, 2010 3:01 AM by genaganna

    A proble about kernel parameters' length

    Fuxianjun

      I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64). 

      I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell  me that "it hard to believe that the second kernel really has an optimum count of 2".

      my questions are :

      1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?

      2.dose global_work_size paramether in  clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?

      3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?

      4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?

        • A proble about kernel parameters' length
          genaganna

           

          Originally posted by: Fuxianjun I find in many examples that the parameters in kernel, are all with a length which is multiple of 64.(take AMD's MatrixMultiplication for example, the matrox's size is 64 * 64, and also in some vector-addtion case, the vector length is multiple of 64). 

           

          I posted a topic in http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=137812&enterthread=y and LeeHowes tell  me that "it hard to believe that the second kernel really has an optimum count of 2".

           

          my questions are :

           

          1.the parameters' length are multiple of 64 because the wavefront size is 64 is some AMD hardware chip , is it correct ?

          Yes you are right

           

          2.dose global_work_size paramether in  clEnqueueNDRangeKernel() means how many workitems are needed or how many wavefronts are needed ?

          global_work_size does say about only work-items but not about wavefronts.

          3.for question 2, i guess the answer is workitem, so ,if global_work_size is less than 64, are there also 64 workitems in the wavefront in working ? for the same reason , there are always a number of workitems which is multiple of 64 in working ,is it correct ?

          One thing you need to remember is  this is a optimization hint for ATI gpu's.   You are able run kernel even if you have global work size is not multiples of 64.

          4. so, if i want to calculate algebra problem of matrix or vector or array with the random size , it is better to append some zeroes to it and make its length is multiple of 64, is it correct ?

          No need to be multiple of 64.   wavefront is size differnet for different devices. wavefront sizes are 16, 32 and 64 for ATI GPU's