With OpenCL 1.1 the global_work_offset parameter for clEnqueueNDRange can be used to specify an offset for calculating the global ID of a work-item.
In my understanding this should mean that both get_global_id() and get_group_id() should be affected by the offset. However, it seems like the AMD implementation only considers the offset for get_global_id() but not for get_group_id().
Unfortunately I don't have access to NVIDIA's implementation of OpenCL 1.1 to test what they do...
Thanks for your reply, Micah.
You're right. It's specified like this in the spec, but I don't think it makes much sense... I thought the offsets can be used to easily split a task between multiple devices, for example. Say you want to execute half of the task on GPU1 and the rest on GPU2. Then you could simply specify the offset for GPU2 as half of the overall problem size. But if the kernel uses get_group_id() this wouldn't work...
I always assumed that get_global_id() / get_local_size = get_group_id(). But apparently that's not the case anymore...
Is there any specific reason for this? What else could the offset be used for other than splitting tasks?
Okay, I can understand why it's simpler to calculate the global ID if the group ID is not affected by the offset. But I still think it's quite confusing and I can't think of an example where you want to offset your global ID but not your group ID...