Originally posted by: ravikeshri Hi,
Please let me know how do we decide the total number of global and local work-items to get the best performance from our OpenCL kernel? Is it dependent on the total number of Processing Elements in the GPU? |
Make sure local work group size is multiples of wavefront size. i.e localWorkGroupSize = 3 or more * WavefrontSize.
Make sure global work group size is multiples of local work group size * 2 or more * No of compute units of Device.
For more details, Read chapter 4 of ATI_Stream_SDK_Programming_Guide.pdf