I have adapted the sample code provided by the CodeAnalyst team into a multithreaded program. Each thread on my code accesses and adds floating point data that is obtained from predefined matrices before writing the sums of the floating point values into a results matrix.
Upon running the code on my CPU Hardware (made up of AMD Athlon X2 6400+ dual core processor with L1 & L2 cache elements); I get the following results in the CodeAnalyst's Data Cache & Level 2 Accesses profiles:
DC Accesses 216 346
DC Misses 317
DC refills L2/sys 281
L2 Requests 637
L2 Misses 479
From I’m these results I’m wondering if anyone can assist in explaining
(i) Why DC misses (and/or DC refills) are less than the L2 misses?
(ii) Why the L2 Requests are less than DC misses (I thought L1 misses should result in and therefore equal L2 requests)?
Thanks in advance for your help.