I've got a very simple loop that is accessing an array - it's just a very simple test program. I've noticed that the number of read requests to the L3 cache (event 0x4E0) is many times greater than the number of L2 cache misses (event 0x7E). I'm using a unit mask of 0xF1 for event 0x4E0, and a unit mask of 0x6 for event 0x7E. So I'm only looking at data, not instructions. What I'm wodering is: where are all these read requests to L3 coming from, if not from L2 cache misses? The ratio is almost exactly 32 to 1, which is itself kind of suspicious. Is the cache line size somehow involved in a way that is not obvious to me?
I'm having the exact same problem. Were you able to resolve that issue? Thanks.