Archives Discussions

ryta1203 · ‎12-03-2010

Why has the fetch busy ratio changed so much?

For example, I have a piece of code that had a fetch busy of 26.33 with old profiler and Cat 10.9 and now it has a fetch busy of ~86.

BTW, no speedup between the two versions whatsoever, can AMD explain this please?

Also, it looks like the Fetch Busy and Fetch Stall are the same (which I'm sure is error), looks like someone is assigning the same value for both fields in the code. You guys should really use code review.

bpurnomo · ‎12-03-2010

The resulting counter values may vary much from one driver version to another because the driver settings, compiler optimization may be different. Please check whether the resulting hardware shader (ISA) is the same between them.

FetchUnitBusy can report the same value as FetchUnitStalled because the value reported in FetchUnitBusy includes the value in FetchUnitStalled. If they report the same value, it means that the stalling time in the fetch unit is dominating the unit.

If you see no speedup in your application even though the counter values are improved, it typically means that the bottleneck for your kernel is somewhere else.

ryta1203 · ‎12-03-2010

Originally posted by: bpurnomo The resulting counter values may vary much from one driver version to another because the driver settings, compiler optimization may be different. Please check whether the resulting hardware shader (ISA) is the same between them.

FetchUnitBusy can report the same value as FetchUnitStalled because the value reported in FetchUnitBusy includes the value in FetchUnitStalled. If they report the same value, it means that the stalling time in the fetch unit is dominating the unit.
If you see no speedup in your application even though the counter values are improved, it typically means that the bottleneck for your kernel is somewhere else.

Two things here:

1. If the fetch busy is 67% and the fetch stalled is 67% then there is no time that the fetch busy is actually doing anything except stalling? Is that accurate? So if they are equal the time the fetch is busy means it is busy being stalled (aka doing nothing). If this is an inaccurate statement then you should consider renaming your counters since this makes no sense.

Also, this is occuring on every sample I have tried so far, must just be coincidence?

I also have some of my own kernels that report 100 fetch busy, 100 fetch stalled... please see paragraph above. So for 100% of the time the fetch unit is busy it is busy being stalled? Again, no makes sense.

2. I'm sure the ISA's are not the same, I expect some optimizations/de-optimizations have probably occured. For example, the Fetch busy in Max Transpose has gone up significantly while the Fetch busy in DCT has gone down significantly. I believe that the write in Max Trans is the bottleneck and something else (maybe ALU?) is the bottleneck in DCT.

Please answer these if you can. Thank you.

bpurnomo · ‎12-05-2010

Yes, the statement is accurate.

If it appears on every sample that you have tried, that does sound like a bug. I tried it with an internal build of the profiler and this problem does not occur anymore. Thank you for the bug report.

ryta1203 · ‎12-05-2010

So am I to assume then that of the two numbers the number being reported is actually the Fetch Busy and not the Fetch Stall?

This makes sense to me.

ryta1203 · ‎12-06-2010

Originally posted by: bpurnomo Yes, the statement is accurate.

If it appears on every sample that you have tried, that does sound like a bug. I tried it with an internal build of the profiler and this problem does not occur anymore. Thank you for the bug report.

Yes, after rolling back to profiler 1.4 it appears that there is an issue with some of the memory counters (fetch busy/stall, cache, etc).

Using profiler 1.4 I get the same results for the samples using Catalyst 10.11 that I got for Catalyst 10.9, so it looks like the profiler is pretty bugged, as you said.

himanshu_gautam · ‎12-07-2010

ryta,

Can you tell which samples are giving erroneous profiler counters in profiler 2.0 .

Archives Discussions

Catalyst 10.11 / Profiler 2.0 Fetch Busy ratio