Optimal number of wave fronts for kernel

Discussion created by boxerab on Dec 29, 2017
Latest reply on Jan 11, 2018 by bomby

My application runs a series of 7 kernels, and most of the time is taken by the 7th kernel.

This kernel has 50% occupancy.

Card is RX 470, 4GB.


For this 7th kernel, there are two settings: the first gives my a total of 100 wavefronts,

while the second gives me a total of only 30 wavefronts.

Timing for the second  setting is about 3X slower than for the first.  VALU utilization is about the same

for both.


I am guessing that the time is slower for the second because 30 wavefronts is not enough to

hide memory latency.  Is there a way of calculating the optimal number of total wavefronts for a kernel,

given the occupancy and the number of CUs ?