In GCN, each SIMD can have up to 10 in-flight or active wavefronts; so total 40 active wavefronts per CU. In general, higher number of active wavefronts (or higher occupancy) helps to hide the memory latency, thus improve the overall performance. The suitable value depends on multiple factors such as ALU and memory usage, memory bandwidth, application logic etc. For example, a higher occupancy may be more useful for an application where memory usage is high than an ALU-bound application. If increasing the occupancy does not improve the performance, it means that the GPU has enough number of active wavefronts to hide the latency. As AMD OpenCL optimization guide says that:
Increasing the wavefronts/compute unit does not indefinitely improve performance; once the GPU has enough wavefronts to hide latency, additional active wavefronts provide little or no performance benefit. A closely related metric to wavefronts/compute unit is “occupancy,” which is defined as the ratio of active wavefronts to the maximum number of possible wavefronts supported by the hardware. |
For more information, please refer this section: OPENCL Optimization — ROCm Documentation latest documentation