2 Replies Latest reply on May 28, 2009 11:15 AM by bbales2

    Memory bandwidth


      Would a measure of total L1 cache accesses (hits and misses) be representative of the number of load and store instructions executed in an application?


      It seems like non-temporal stores could mess up this number, but that isn't a problem here. I was curious if there were any other gotchas that could make this inaccurate.


      I am looking for total memory instructions executed. I do not care if they are found in L1 or in system memory.



        • Memory bandwidth

          According to CodeAnalyst documentation, the L1 data cache access event includes all accesses to the data cache for load and store. It may also include some "scratchpad accesses" due to microcoded (vectorpath) instructions, but that should be very rare.

          For Athlon 64 or Turion, each count represents an 8-byte access, even if only part of that is transferred. I don't know how that affects the 128-bit loads in Barcelona and Shanghai processors, though. (The CodeAnalyst documentation for family 10 processors seems missing.)

          For non-cacheable, streaming store or write-combining accesses, use event 0x065 memory request by type.