According to https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html,the values specified in global_work_size must be evenly divisible by the corresponding values specified in local_work_size.However,I set local_work_size=256 and global_work_size=384 in my program.It debug with no error.
I am using AMD APP SDK 3.0.My questions as follows:
1. How does the driver deal with it when global_work_size can not be evenly divisible by loacl_work_size?
2. Will it cause any problem about performance degradation?