Ryta,
The answer is it is a little bit of both. Yes wavefronts do run in parallel, but only as much as the hardware can handle at once, and those that are either stalled or waiting to execute are in the thread queue/run-queue/ultra threaded dispatcher, or however you want to call it.
As for the terminology, we are working on that, but the compute world and the graphics world many times have different terms for the same thing.
As for how many threads can run on the SIMD, the slides CGO2008 pages 10 and 11 give out information on that. In compute shader mode w/ LDS the limit is 1024.
Just to break it down how it works as simple as possible:
Your execution domains is broken into blocks of 64 threads, called wavefronts, and schedules them to execute on a SIMD.
When executing on a SIMD, each wavefront is broken into 4 groups of 16, with each group executing on the four 2x2 blocks of thread processors per SIMD
Each thread processor processes 5 instructions for a single thread, also called an ALU clause
A wavefront continues executing on a simd for that ALU CF clause, where it then returns to the thread dispatcher until it is schedule to execute again.