Here is an overview of the GCN core including compute units. The first few pages talk about GCN relative to earlier AMD GPU cores so you might want to fast-forward through those pages if you haven't worked with our VLIW shader cores:
TL:DR version - each CU contains 4 16-way SIMDs, referred to as Vector ALUs, along with a shared scaler unit. Each SIMD/VALU works on 64-item wavefronts, performing 64 operations (1 VALU instruction) in 4 clocks along with optional scalar, branch, and other instructions. Each SIMD/VALU has 10 associated program counters and can switch to a different thread on every VALU (4 clock) boundary to hide latency.
The primary programming models for our GPUs are OpenCL and HCC/HIP: