Number of active work-groups depends upon:
a. work-group size
b. resource consumed by each work-group (say for example: registers and LDS usage)
c. and the amount of resources the machine possess
As per your input, you have a GPU with 20CUs and work-group size you mentioned is 256.
I. what is the number of active work_groups beeing executed on a GPU?
The number of active work-groups is a result of a, b and c all three above. You have not mentioned anything about kernel usage, so i would not consider the consequence of b and c.
Considering a points only, and for a GCN card, you can have total 40 work-groups per CU; if each work-group has only one wave-front.
But for the work-group size of 256, the number of active work-groups per CU limited to 16 (h/w limit).
So As per the shared link, the number of active wave-fronts per CU would be = min(16*4, 40) = 40
and hence number of active work-group per CU should be = number of active wave-fronts per CU / work-group size = 40 / 4 = 10.
1. Active wave-front is also known as in-flight wave-front, which means that the number of concurrent wave-fronts that has been launched by scheduler; and it depends on work-group size and resources utilization in kernel. Yes there can be 40 WFs per CU. Yes your GPU can process 20*40*64 = 51200 work-items.
2. Now each CU will execute 40/4 = 10 concurrent work-groups.
3. And hence total number of active work-groups during the execution 10*20 = 200 concurrent work-groups.
Thank you Gopal for detailed answer. That help me a lot with understanding certain things.