Using the Performance Monitoring Counter, the L3 cache partitioning function that AMD provides and SpecCPU2006,
I've been measuring the performance changes according to the number of L3 subcaches allocated for a core.
When one program(e.g. lbm or sphinx3) was running on core 0, it worked as I expected.
As the number of subcaches allocated for core 0 decreased, the L3 cache miss ratio(L3 Cache Misses/Read Request to L3 Cache) became higher.
But result was the same even when two programs were running core 0 and core 3 respectively.
I expected that the overall performance would be higher when partitioning the L3 cache into two partitions dedicated for each compute unit than when sharing all L3 cache,
because they wouldn't interfere with each other if they didn't share any subcaches.
I think it was supposed to work that way because that is why we use cache partitioning.
So, What I was wondering is that why this result happened and if there were anything wrong with my procedures.
The following is my experimental setup. The result can be seen in the attached word file.
P.S. I'd like to add some more information about my procedure.
ex) For NBPMCx4E0 Read Request to L3 Cache
int msrHandle = open("/dev/cpu/0/msr",O_RDWR);
lseek(msrHandle, 0xc0010240, SEEK_SET);
unsigned long long value = 0x40050F7E0;
write(msrHandle, &value, sizeof(value));
unsigned long long eventCntReg = 0xc0010241;
unsigned long long result = read(msrHandle, &eventCntReg, sizeof(eventCntReg)); // Read this every second.
1. Experimental Environment
2. Experimental Setup