Hi, would anybody explain me exact scheduler working scheme for GCN ? Especially in situation, when workgroup size is smaller then 256 (128, 64). I understand, that CU can handle several small workgroups at the same time, but in what order? Does scheduler pack each CU one after another by them tightly or distribute workgroups to whole line of CUs for balanced memory access (I think it should be a better way)? I really need to know that (memory access optimization purposes).