4 Replies Latest reply on Sep 12, 2012 12:58 PM by wkohlani

    L3-cache miss vs. L2-L1 misses




      I've been getting almost 30x higher L3-cache misses than L2-cache misses on AMD Magnycours. Can anyone explain why?



        • Re: L3-cache miss vs. L2-L1 misses

          Hi Hankendi,


          On the Magny Cours development page on developer.amd.com, it mentions that the L3 is shared across cores.  Are you using the performance event 4E1h?  In that case, the BKDG says

          To determine the total number of cache misses from one core, select only a single core using UnitMask[7:4] and set UnitMask[2:0] to 111b.



          It sounds like you may be getting the event on all cores instead of just one.  For the L2 misses, are you using event 07Eh?


          How many cores do you have?  What's the system topology, if you know it?




            • Re: L3-cache miss vs. L2-L1 misses

              Magnycours is a 12-core multichip processor, 2 6-core processors on a single package. For L2 I'm using 77Eh as suggested here: http://developer.amd.com/Assets/intro_to_ca_v3_final.pdf


              For L3, I'm using f74e1h, which counts L3cache misses from any core, which is also suggested in the same document. How can I derive the total number of L3 cache misses by using that event? It seems like it's giving me some sort of a aggregated result, but still I'm getting much higher numbers even I normalize with 12. And one other weird behavior is, either the first 6 cores or the 2nd set of 6-cores have approximately 100 times higher L3-cache misses than the other set of cores.

              • Re: L3-cache miss vs. L2-L1 misses

                I have a similar problem on a magny-cours system with 4 cpus, 12-core each (two groups of 6 cores).


                I'm using PAPI which allows me to set the counter mask (Unit Mask) as an integer number from 0-255;

                I'm trying to get the L3 misses on one specific core, I follow your advice and the documentation and select the core using UnitMask[7:4] and set UnitMask[2:0] to 111b. Strangely, when I do that, I receive a zero count.

                I tried all combinations for the UnitMask, in fact I ran my little app with UnitMask = 0 up to 255 and the only time I get a count is when the unit mask = 0 and when it's equal to 1.

                I use taskset -c to pin my run onto a specific core; I tried pinning to different cores 0, 1, 2, 3,4,5 but the same thing happens??


                Am I doing something wrong?


                The command I'm using looks like the following (note: I'm using PAPI thru papiex command-line tool and everything else works but anything with L3 does not seem to be working)


                taskset -c 21 papiex -eL3_CACHE_MISSES:c=7 -- ./a.out


                where c specifies the mask using an integer from 0-255


                I'd appreciate any input.