0 Replies Latest reply on Jun 10, 2013 10:39 AM by pjbear

    Monitoring performance changes according to L3 Cache Partitioning

    pjbear

      Hi everyone!


      Using the Performance Monitoring Counter, the L3 cache partitioning function that AMD provides and SpecCPU2006,

       

      I've been measuring the performance changes according to the number of L3 subcaches allocated for a core.

       

      When one program(e.g. lbm or sphinx3) was running on core 0, it worked as I expected.

       

      As the number of subcaches allocated for core 0 decreased, the L3 cache miss ratio(L3 Cache Misses/Read Request to L3 Cache) became higher.

       

      But result was the same even when two programs were running core 0 and core 3 respectively.

       

      I expected that the overall performance would be higher when partitioning the L3 cache into two partitions dedicated for each compute unit than when sharing all L3 cache,

       

      because they wouldn't interfere with each other if they didn't share any subcaches.

       

      I think it was supposed to work that way because that is why we use cache partitioning.

       

      So, What I was wondering is that why this result happened and if there were anything wrong with my procedures.


      The following is my experimental setup. The result can be seen in the attached word file.


      P.S. I'd like to add some more information about my procedure.

          ex) For NBPMCx4E0 Read Request to L3 Cache

                int msrHandle = open("/dev/cpu/0/msr",O_RDWR);

                lseek(msrHandle, 0xc0010240, SEEK_SET);

                unsigned long long value = 0x40050F7E0;

                write(msrHandle, &value, sizeof(value));

                unsigned long long eventCntReg = 0xc0010241;

                unsigned long long result = read(msrHandle, &eventCntReg, sizeof(eventCntReg));     // Read this every second.

       

      ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

      1.  Experimental Environment

      • OS
        • Ubuntu 12.04.1 LTS
      • Processor
        • 2 x 16 core 2.6 GHz AMD Opteron™ Processor 6282 SE
        • 8 MB L3 Cache per die
      • Benchmark Program
        • SpecCPU 2006

       

      2.  Experimental Setup

      • No Cache Partitioning – 4 Partitions for One Compute Unit
        • bash$ sudo setpci –s 00:18.4 1D4.l = 0x0000FFFF // NODE 0
        • bash$ sudo setpci –s 00:19.4 1D4.l = 0x0000FFFF // NODE 1
        • bash$ sudo setpci –s 00:1A.4 1D4.l = 0x0000FFFF // NODE 2
        • bash$ sudo setpci –s 00:1B.4 1D4.l = 0x0000FFFF // NODE 3
        • bash$ sudo setpci –s 00:18.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:19.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:1A.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:1B.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
      • Cache Partitioning – 2 Partitions for One Compute Unit
        • bash$ sudo setpci –s 00:18.4 1D4.l = 0x0000CC33 // NODE 0
        • bash$ sudo setpci –s 00:19.4 1D4.l = 0x0000CC33 // NODE 1
        • bash$ sudo setpci –s 00:1A.4 1D4.l = 0x0000CC33 // NODE 2
        • bash$ sudo setpci –s 00:1B.4 1D4.l = 0x0000CC33 // NODE 3
        • bash$ sudo setpci –s 00:18.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:19.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:1A.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
        • bash$ sudo setpci –s 00:1B.3 1B8.l = 0x08041000(For L3 BAN Mode Off)
      • MSR Handle
        • msrHandle = open("/dev/cpu/[CPU_ID]/msr",O_RDWR); // 0,8,16,24 used for each Node respectively
        • lseek(msrHandle, regAddress, SEEK_SET);
        • read(msrHandle, &value, sizeof(value));
        • write(msrHandle, &value, sizeof(value));
      • NB Performance Monitor Counter
        • NBPMCx4E0 Read Request to L3 Cache
          • NB Performance Event Select: MSRC001_0240 -> 0x40050F7E0
          • All cores on the node tracked.
          • Count prefetch and non-prefetch.
          • Count Read Block Modify.
          • Count Read Block Shared.
          • Count Read Block Exclusive.
        • NBPMCx4E2 L3 Cache Misses
          • NB Performance Event Select: MSRC001_0242 -> 0x40050F7E1
          • All cores on the node tracked.
          • Count prefetch and non-prefetch.
          • Count Read Block Modify.
          • Count Read Block Shared.
          • Count Read Block Exclusive.