0 Replies Latest reply on Oct 19, 2009 10:36 AM by yingbo

    About the measurement of L2 Cache on Opteron

    yingbo

      I'm evaluating program performance on AMD Opteron 270 with OProfile. I refer to "Basic Performance Measurements for AMD Athlon™ 64, AMD Opteron™ and AMD Phenom™ Processors" by Paul J. Drongowski.

      Paul's artcile introduces two methods for L2 cache. One is direct method and the other is indirect method.

      Direct method:
      L2 request rate = (L2_requests + L2_fill_write) / Ret_instructions
      L2 miss ratio = L2_misses / (L2_requests + L2_fill_write)

      Indirect method:
      IC_misses = IC_refills_L2 + IC_refills_sys
      DC_misses = DC_refills_L2 + DC_refills_sys
      L2_requests = IC_misses + DC_misses + L2_requests_TLB
      L2 request rate = L2_requests / Ret_instructions
      L2_misses = IC_refills_sys + DC_refills_sys + L2_misses_TLB
      L2 miss ratio = L2_misses / L2_requests

      I have some questions about L2 Cache measurement.

      1. How to compute L2_request_TLB in the indirect method?
      My understanding is L2_request_TLB is equal to the sum of L1_ITLB_MISS_AND_L2_ITLB_MISS and L1_DTLB_AND_L2_DTLB_MISS. Event REQUESTS_TO_L2 has a mask bit (0x4) for TLB. I measured mcf and vortex in SPEC2000.

      opcontrol --event=REQUESTS_TO_L2:50003:0x4--event=L1_ITLB_MISS_AND_L2_ITLB_MISS:50003  --event=L1_DTLB_AND_L2_DTLB_MISS:50003 --image=mcf.exe,vortex.exe

      L1_DTLB_AND_L2_DTLB_MISS|REQUESTS_TO_L2:0x4|L1_ITLB_MISS_AND_L2_ITLB_MISS:50003|
        samples|      %|  samples|      %|  samples|      %|
      -------------------------------------------------------------------------
           1377 100.000      1664 100.000      0 100.000  mcf.exe
           1192 100.000        10 100.000      1816 100.000 vortex.exe

      The is a big discrepancy between REQUESTS_TO_L2:0x4 and (L1_ITLB_MISS_AND_L2_ITLB_MISS + L1_DTLB_AND_L2_DTLB_MISS). Which is appropriate? 

      2. How to compute L2_request?

      Direct method: L2_requests + L2_fill_write
      Indirect method: IC_misses(IC_refills_L2 + IC_refills_sys) + DC_misses(DC_refills_L2 + DC_refills_sys) + L2_requests_TLB

      1) Direct method
      opcontrol --event=L2_CACHE_FILL_WRITEBACK:50003 --event=REQUESTS_TO_L2:50003:0x7 --image=mcf.exe, vortex.exe


      L2_CACHE_FILL_WRITEBACK|REQUESTS_TO_L2:0x7|
        samples|      %|  samples|      %|
      ------------------------------------
          16402 100.000     15920 100.000 mcf.exe
          11610 100.000     13761 100.000 vortex.exe

      L2_request_mcf_direct = 16402 + 15920 = 32322
      L2_request_vortex_direct = 11610 + 13761 = 25371

      2) Indirect method
      opcontrol --event=DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM:50003--event=INSTRUCTION_CACHE_REFILLS_FROM_L2:50003 --event=INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM:50003--event=REQUESTS_TO_L2:50003:0x4 --image=mcf.exe,vortex.exe

      INSTRUCTION_CACHE_REFILLS_FROM_L2|INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM|REQUESTS_TO_L2:0x4|DATA_CACHE_REFILLS_FROM_L2_OR_SYSTEM |
        samples|      %|  samples|      %|  samples|      %|  samples|      %|
      ------------------------------------------------------------------------
           2251 100.000        15 100.000      2160 100.000      7587 100.000 vortex.exe
           1 100.000           0 100.000      1660 100.000     10491 100.000 mcf.exe

      L2_request_mcf_indirect = 2251 + 15 + 2160 + 7587 = 12013
      L2_request_vortex_indirect = 1 + 1660 + 10491 = 12152

      There is a VERY BIG discrepancy between L2_request computed with direct and indirect methods. Why?

      3. Are the following statements right?

      1) INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x1.
      2) DATA_CACHE_REFILLS_FROM_SYSTEM is equal to L2_CACHE_MISS:0x2.

      Any suggestion is welcome! Appropriate measurement parameters are very necessary and important. We should have a unified version