I was reading "Basic Performance Measurements for AMD Athlon™ 64 and AMD Opteron™ Processors" by Paul J. Drongowski and have a question regarding measuring Data cache misses.
The formula used is
DC_misses = DC_refill_L2 + DC_refill_sys
DC_refill_L2 = (select: 0x42, Unitmask: 0x1E)
DC_refill_sys = (select: 0x43, Unitmask: 0x1E).
From the BKDG for AMD Opteron, the description for the EventSelect 0x42 mentions "UNIT_MASK bit 0 reflects refills which missed in the L2, and provides the same measure as the combined sub-events of event 43h".
So by adding these two counters, arent we duplicating the number of events due to refill from northbridge?
I ran a little test to verify if Unitmask(01h) for Eventselect(42h) is same as Unitmask(1Eh) for EventSelect(43h)
$ ./perfex -e 0x00410041 -e 0x00411D42 -e 0x00410142 -e 0x00411E43 ./a.out
event 0x00410041@0 119152
event 0x00411D42@1 133025
event 0x00410142@2 43866
event 0x00411E43@3 43858
So, I believe 0x00410142 and 0x00411E43, count the same events.
But my question is, why cant 0x00411E42 then result in the sum of 0x00411D42 and 0x00410142.
$ ./perfex -e 0x00410041 -e 0x00411D42 -e 0x00410142 -e 0x00411E42 ./a.out
event 0x00410041@0 120005
event 0x00411D42@1 124355
event 0x00410142@2 40344
event 0x00411E42@3 109072
Shouldnt 0x00411E42 have been (124355 + 40344 = 164699) instead of 109072 ?