I have Radeon HD 4870x2 card (RV770 GPU), and i've got number of simultaneously processing work-groups = 32 by experimental way.

I didn't understand, from where this number appeared.

As I know for RV770, CL_DEVICE_MAX_COMPUTE_UNITS = 10. Why then 32?

Additional info:

a) works fine:

globalWorkSize = 8192

localWorkSize = 256

b) don't work:

globalWorkSize = 8448

localWorkSize = 256

c) don't work:

globalWorkSize = 16384

localWorkSize = 256

local work size is got from clGetKernelWorkGroupInfo(... CL_KERNEL_WORK_GROUP_SIZE ... )

Then I have following questions:

1) how many simultaneous work groups can work together?

2) if that number exists and is finite, then is the number of simultaneousely processing work-groups depends on GPU type?

3) also if that number exists and is finite, then how this number can be retrieved programmatically by querying device? (like for number of SIMD engines using clGetDeviceInfo( ..., CL_DEVICE_MAX_COMPUTE_UNITS, ... )

4870 have 10 processing unit. each have 16 5-way SIMD core. so 10*16*5 is 800 which is in specification.