Originally posted by: ravikeshri Hi,
Please let me know how do we decide the total number of global and local work-items to get the best performance from our OpenCL kernel? Is it dependent on the total number of Processing Elements in the GPU?
Make sure local work group size is multiples of wavefront size. i.e localWorkGroupSize = 3 or more * WavefrontSize.
Make sure global work group size is multiples of local work group size * 2 or more * No of compute units of Device.
For more details, Read chapter 4 of ATI_Stream_SDK_Programming_Guide.pdf
Thank you very much for the answer and also for the reference - genaganna. I will go through the guide now which seems to be very informative.