If I call clEnqueueNDRangeKernel(...) with a local size of NULL, is there any way to find out how the hardware has decided to utilise the work groups, i.e. how many work items (kernel instances) are running in each group? I've had a look at the stats in CodeXL but I don't understand a lot of what is being reported. I'm assuming that what I'm looking for is buried somewhere in all those numbers.
(On a related note, if work_dim is 2 and local size is set to (1, 1), would this effectively result in only one kernel instance running in each workgroup?)