2 Replies Latest reply on Oct 14, 2009 1:06 PM by pdrongowski

    How to relate execution time to the CPU_CLK_UNHALTED sampling?


      Hi @ll,

      I used OProfile on AMD Opteron 2.0GHz.

      opcontrol --event=CPU_CLK_UNHALTED:3001 --image=gzip.exe

      The sampling result is 902,645.   The execution time of gzip.exe is 9.281s (linux time command).

      Estimated execution time: sampling result * sampling interval / CPU frequency
      902,645 * 3001 / 2,000,000,000 = 1.35s.  But there is a big discrepancy between the two times.

      The event CPU_CLK_UNHALTED means "CPU Clocks Not Halted". Does Oprofile count the latency time caused by cache misses or I/O?  These events can cause CPU halt.  If not, the CPU_CLK_UNHALTED sampling result makes little sense and cannot represent program performance.

      But the document "Basic Performance Measurements for AMD Athlon™ 64,AMD Opteron™ and AMD Phenom™ Processors"  says "IPC = Ret_instructions / CPU_clocks", which means CPU_CLK_UNHALTED counts cache misses and I/O waiting time. Is it right?

      BTW, when I changed event count to 1,0001 the sampling result is 752,647. And changed to 10,0001, the result is 167,612. Why does not the sampling result scale with the event count?

      I'm confused...

      Any suggestion is welcome.

        • How to relate execution time to the CPU_CLK_UNHALTED sampling?

          My guess is that 3001 is kinda too small for CPU_CLK_UNHALTED event. There will be many sample lost during profiling. Please check /var/lib/oprofile/samples/oprofiled.log if there are any sample lost due to overflow etc.

          • How to relate execution time to the CPU_CLK_UNHALTED sampling?

            Hi --

            I agree with Lei, a sampling period of 3001 is _way_ too aggressive for a high frequency event like CPU_CLK_UNHALTED. I've found that a sampling period of 100,000 is a practical limit for this event and RETIRED_INSTRUCTIONS. The data collection overhead increases rapidly under 100,000 and the interrupt handling (to collect samples) pollutes the caches/TLB/branch history tables. The pollution affects the workload behavior and the workload behavior is no longer "representative."

            Generally, the CPU clock is not halted for cache misses. The load to use latency for a cache miss is usually pretty short and a clock halt is not necessary. I/O is another issue because the CPU idle period is much longer.

            IPC is a local measure of performance and indicates instruction-level parallelism within a small local neighborhood (like a hot loop). Yep, IPC can be affected by a halted clock. However, if IPC is applied within a tight compute bound loop, it can still be an effective performance measurement.

            We also recommend disabling clock frequency throttling. If the clock speed is throttled, no all cycles will be the same length of time! You may need to disable any power management software that changes the clock frequency.

            Yep, I agree -- using CPU_CLK_UNHALTED can be tricky!

            -- pj