5 Replies Latest reply on Jan 11, 2018 7:30 AM by bomby

    Optimal number of wave fronts for kernel


      My application runs a series of 7 kernels, and most of the time is taken by the 7th kernel.

      This kernel has 50% occupancy.

      Card is RX 470, 4GB.


      For this 7th kernel, there are two settings: the first gives my a total of 100 wavefronts,

      while the second gives me a total of only 30 wavefronts.

      Timing for the second  setting is about 3X slower than for the first.  VALU utilization is about the same

      for both.


      I am guessing that the time is slower for the second because 30 wavefronts is not enough to

      hide memory latency.  Is there a way of calculating the optimal number of total wavefronts for a kernel,

      given the occupancy and the number of CUs ?