All stream cores within a compute unit execute the same instruction for each cycle. A work item can issue one VLIW instruction per clock cycle. The block of work-items that are executed together is called a wavefront. To hide latencies due to memory accesses and processing element operations, up to four workitems from the same wavefront are pipelined on the same stream core. For example, on the ATI Radeon™ HD 5870 GPU compute device, the 16 stream cores execute the same instructions for four cycles, which effectively appears as a 64-wide compute unit in execution width.
The size of wavefronts can differ on different GPU compute devices. For example, the ATI Radeon™ HD 5400 series graphics cards has a wavefront size of 32 work-items. The ATI Radeon™ HD 5800 series has a wavefront size of 64 work-items.
Compute units operate independently of each other, so it is possible for each array to execute different instructions.
From the AMD Accelerated Parallel Processing Programming Guide