GPU scheduler can assign multiple work-groups to a CU if GPU resources (VGPRs, LDS etc.) are available. Actually, ACEs (Asynchronous Compute Engines) are responsible for all compute shader scheduling and resource allocation. ACEs manage compute tasks and GPU resources and accordingly create and dispatch work-group(s) to individual CUs for execution. It is very dynamic in nature.
So, I think GPU scheduler will automatically run two or more work-groups on same CU if :
- GPU resource usage allows multiple no. of work-groups to be executed concurrently on a CU
AND
- enough no. of work-groups are available
From OpenCL programming perspective, device fission could be used in case, however, currently device fission is not supported for AMD GPUs. Other than this, I'm not aware of any OpenCL feature that can be used to control the association between work-groups and CUs.