2 Replies Latest reply on Jan 28, 2015 8:26 AM by jpola

    Number of active work_groups during the execution

    jpola

      Dear all,

       

      I'm wondering about the number of active work_groups during the kernel execution on GPU. Let's assume that my GPU have 20 compute units. On this GPU I'm executing a kernel for which the work-group size is (256, 1, 1). The global-work-size is 20*work-group size.

      Now at certain time what is the number of active work_groups beeing executed on a GPU? I'm just trying to understand the following statement from the http://developer.amd.com/tools-and-sdks/archive/amd-app-profiler/user-guide/app-profiler-kernel-occupancy/

      Otherwise, if there is more than one wavefront (WF) per work-group (WG), there is an upper limit of 16 work-groups (WG) per compute unit (CU). Then, the maximum number of wavefronts on the compute unit is given by:

      In my case I have 4 wavefronts in each work-group.

       

      My questions are following:

      1. What exactly mean active wavefront? There can be up to 40 wavefronts per CU. Is it mean that at certain time my GPU can process 40 * 20 * 64 = 51200 work-items? I think the number is too big.

      2. Can I say that work-group is scheduled to be executed on CU. Which means that my 20 CU will execute one work-group?

      3. If each work-group is executed on separate CU, as I think, can I say that this is the number of active work-groups during the execution?

       

      Thank you very much for your help,

      Kuba.

        • Re: Number of active work_groups during the execution
          gopal

          Hi Kuba,

           

          Number of active work-groups depends upon:

            a. work-group size

            b. resource consumed by each work-group (say for example: registers and LDS usage)

            c. and the amount of resources the machine possess

           

          As per your input, you have a GPU with 20CUs and work-group size you mentioned is 256.

          I. what is the number of active work_groups beeing executed on a GPU?

          The number of active work-groups is a result of a, b and c all three above. You have not mentioned anything about kernel usage, so i would not consider the consequence of b and c.

          Considering a points only, and for a GCN card, you can have total 40 work-groups per CU; if each work-group has only one wave-front.

          But for the work-group size of 256, the number of active work-groups per CU limited to 16 (h/w limit).

          So As per the shared link, the number of active wave-fronts per CU would be = min(16*4, 40) = 40

          and hence number of active work-group per CU should be = number of active wave-fronts per CU / work-group size = 40 / 4 = 10.

           

          1. Active wave-front is also known as in-flight wave-front, which means that the number of concurrent wave-fronts that has been launched by scheduler; and it depends on work-group size and resources utilization in kernel. Yes there can be 40 WFs per CU. Yes your GPU can process 20*40*64 = 51200 work-items.

          2. Now each CU will execute 40/4 = 10 concurrent work-groups.

          3. And hence total number of active work-groups during the execution 10*20 = 200 concurrent work-groups.

           

          Thanks