Let's assume you are using 5 GPR/thread, so you can have 51 wavefronts active (256/5 = 51).
When a wavefront is completed, does a new wavefront allocate resources and get put in the dispatcher? If so, how many wavefronts do you need above the actual executing amount to have it so that this is not noticed in performance.
What I mean is, do wavefront batches (assuming a batch is how many can run in parallel, ie. dispatcher+executing) executing serially or is there some overlapping (a new wavefront is created as soon as an old wavefront is finished)?