I don't think event-multiplexing helps for this case. This is mainly because events are configured for all cores -- regardless multiplexing is used or not.
For example, unit mask 0x17 is supossed to measure core 0 L3 miss. If the event/unitmask (0x17) is used on core 2, it will generate wrong data.
If it's on Linux with CodeAnalyst (or Oprofile), there is no workaround until you modify Oprofile driver for event configuration.
If it's on Windows, you may try to use command line utility. For example,
"caprofile.exe /e 0x4E117:5000:011 /mask 1" to configure L3 Miss on Core 0 only.
Then create a separate cmd, run
"caprofile.exe /e 0x4E127:5000:011 /mask 2" to configure L2 miss on Core 1 only.
Thanks for your answer.
I want to use this feature in Xen.
And, I can modify Oprofile driver in Xen, so I can set the performance counters as I said.
I'm asking about the architecture and functionality of the performance counter in Opteron 6168 which has shared L3 cache.
In fact, I already implemented setting the unit_mask differently for each core, but it seems not work correctly. I don't know why. My implementation may have bugs, or the performance counter may not have the functionality.
Can you answer my question again? Thanks in advance.
Have you found a way to count L3 cache miss on this machine?