From my reading of the OpenCL spec, the max global index in any dimension is 2^32-1.
The spec states that an error will be returned when a global work size is specified that is bigger than the size_t type on the device (see page 133 of the OpenCL 1.1 spec).
As per spec there is no limit on the number of workgroups each device need to support.Obviously we cannot exceed 2^32-1 as the global thread id would not be representable in 32 bits.
I tried to create a test program & it ran successfully for 2^25 workgroups.After which it failed due to the lack of system resources.
So I think Gpus can execute as many workgroups in round robin method as system resources allow.The only upper limit being 2^32-1.
I hope there are no more issues regarding this topic anymore.