Global worksize

Discussion created by pratapk on Aug 4, 2011
Latest reply on Aug 8, 2011 by LeeHowes
optimal Global worksize

we need to chose the local workgroup size to in the order of warp size for optimal performance.

Does that apply to Global work size, I've seen following code in one of Nvidia OpenCL slides.


size_t localWorkSize = 256; ( or 64)

// will round it

int numberWorkGropus = ( N + localWorkSize -1) /  localWorkSize ;

size_t globalWorkSize = numberWorkGropus  * localWorkSize ;


But, rounding and multiplying increases a global workgroup to be beyond the 'N', 


1) We would be having global_id beyond required, how to take care of it ?

2) It can waste the OpenCL threads, Is there any advantage in doing it ?