I am using CodeXL to profile and analyze OpenCL application on AMD GPU(Radeon R9 series..).
In the profile mode, some performance counters are provided but I cannot understand the exact meaning of some counters.
They are slightly ambiguous to me...
Please let me know if someone know the meaning of counters below:
(1) CachHit: The percentage of fetches that hit the data cache. Value range: 0% (no hit) to 100% (optimal).
==> Is this from L1 data cache? Otherwise, does it report L1+L2?
(2) MemUnitBusy: The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).
==> First of all, what exactly is the memory unit? Is this "Address unit + Data Return unit + L1 Vector Data Cache" in the compute unit? Otherwise, is this the memory partition which consists of L2 cache and memory controller?
==> In addition, can I assert that "MemUnitBusy - MemUnitStalled" is the time when the memory unit is really doing something useful?
(3) MemUnitStalled The percentage of GPUTime the memory unit is stalled. Try reduce the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad).
==> What exactly does the stall means? Does this mean that the memory unit cannot accept any pending request because it is processing the other requests or the resources are not available yet?