hi all,
As per spec there is no limit on the number of workgroups each device need to support.Obviously we cannot exceed 2^32-1 as the global thread id would not be representable in 32 bits.
I tried to create a test program & it ran successfully for 2^25 workgroups.After which it failed due to the lack of system resources.
So I think Gpus can execute as many workgroups in round robin method as system resources allow.The only upper limit being 2^32-1.
I hope there are no more issues regarding this topic anymore.