When collect trace from my application, the AMD APP Profiler reports:
I leave the local work size to NULL and execute the kernel in 1D NDRange.
On 5870 GPU, the MAX_WORK_GROUP_SIZE is 256, so one work-group has maximum 256 work-items or 4 wavefronts.
1. Since it has 5120 wavefronts, does it mean, it has 5120/4 = 1280 Work-group?
2. AFAIK, work-groups are distributed (equally??) to SIMD engine. 5870 has 20 SIMD engines, so each SIMD engine get 1280/20 = 64 work-groups (= 64 x 4 wavefronts = 256 wavefronts = 16384 work-items). Is it correct?