I have read many threads about the topic here, and one of the best one probably this one:
however, there are number of simple questions outstanding:
a) it seems to be conclusive, that wavefronts are executing within the Thread Group. Thread Group size is defined by gridBlock.width parameter of CALprogramGrid structure. And number of Thread Groups are defined as domain execution size (in pixels) devided by Thread Group size.
b) If Thread Group size is twice more then actual execution units (for 7xx # of execution units seems to be == 64), and set in kernel and in gridBlock.width, whether Thread Group will queue 2 wavefronts on the same SIMD still being within the same Thread Group without interruption?
c) If fence_ work per Group, and Group Size is more then available execution units, and execution split on 2 wavefornts (case b above), whether first wavefront will be deferred until second wavefront will reach the barrier, to have first wavefront to be continued? Or it is just incorrect setting to have Group Size > then actual execution units per SIMD?
d) If wavefornt size is set to ½ of executing units of SIMD, whether half of SIMD will be wasted or another Group will be started on half of SIMD?
e) If there are more Groups set then available SIMDs, whether groups will be scheduled for execution one after another in some non-predictive order until finished?
f) Once wavefront execution finished, whether LDS content remains persistent between wavefront runs, so next Thread Group will find LDS content from previous wavefront and can be reused?