6 Replies Latest reply on Dec 7, 2010 4:51 PM by himanshu.gautam

    Catalyst 10.11 / Profiler 2.0 Fetch Busy ratio

    ryta1203

      Why has the fetch busy ratio changed so much?

      For example, I have a piece of code that had a fetch busy of 26.33 with old profiler and Cat 10.9 and now it has a fetch busy of ~86.

      BTW, no speedup between the two versions whatsoever, can AMD explain this please?

      Also, it looks like the Fetch Busy and Fetch Stall are the same (which I'm sure is error), looks like someone is assigning the same value for both fields in the code. You guys should really use code review.

        • Catalyst 10.11 / Profiler 2.0 Fetch Busy ratio
          bpurnomo

          The resulting counter values may vary much from one driver version to another because the driver settings, compiler optimization may be different.  Please check whether the resulting hardware shader (ISA) is the same between them.

          FetchUnitBusy can report the same value as FetchUnitStalled because the value reported in FetchUnitBusy includes the value in FetchUnitStalled.  If they report the same value, it means that the stalling time in the fetch unit is dominating the unit. 

          If you see no speedup in your application even though the counter values are improved, it typically means that the bottleneck for your kernel is somewhere else.

            • Catalyst 10.11 / Profiler 2.0 Fetch Busy ratio
              ryta1203

               

              Originally posted by: bpurnomo The resulting counter values may vary much from one driver version to another because the driver settings, compiler optimization may be different.  Please check whether the resulting hardware shader (ISA) is the same between them.

              FetchUnitBusy can report the same value as FetchUnitStalled because the value reported in FetchUnitBusy includes the value in FetchUnitStalled.  If they report the same value, it means that the stalling time in the fetch unit is dominating the unit. 

              If you see no speedup in your application even though the counter values are improved, it typically means that the bottleneck for your kernel is somewhere else.

              Two things here:

              1. If the fetch busy is 67% and the fetch stalled is 67% then there is no time that the fetch busy is actually doing anything except stalling? Is that accurate? So if they are equal the time the fetch is busy means it is busy being stalled (aka doing nothing). If this is an inaccurate statement then you should consider renaming your counters since this makes no sense.

              Also, this is occuring on every sample I have tried so far, must just be coincidence?

              I also have some of my own kernels that report 100 fetch busy, 100 fetch stalled... please see paragraph above. So for 100% of the time the fetch unit is busy it is busy being stalled? Again, no makes sense.

              2. I'm sure the ISA's are not the same, I expect some optimizations/de-optimizations have probably occured. For example, the Fetch busy in Max Transpose has gone up significantly while the Fetch busy in DCT has gone down significantly. I believe that the write in Max Trans is the bottleneck and something else (maybe ALU?) is the bottleneck in DCT.

              Please answer these if you can. Thank you.