AnsweredAssumed Answered

Need for more workgroups

Question asked by skanur on Sep 23, 2014
Latest reply on Sep 24, 2014 by skanur

Hi,

 

I have two questions that I can explain using this scenario

 

Scenario:

Lets say I have a kernel that can be grouped as a workgroup of 64 work-items i.e. 1 wavefront. I get this number from clGetKernelWorkGroupInfo api of OpenCL. I assume this api calculates this based on register allocation. Also from the same api I can get the local memory usage by the kernel. Dividing the total local memory (x 2 for GCN arch) by kernel local memory usage, I get maximum workgroups I can fit per compute unit (CU). Subsequently I can get workgroups I can fit in gpu, lets call this number "workgroup-gpu".

 

Question:

  1. I remember reading in the forums that only one workgroup executes at a time on CU. So how does extra workgroups/CU help hiding memory latency?
  2. Is there any other reason to put more than "workgroup-gpu" workgroups in the GPU, as the rest are executed sequentially?

Outcomes