cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bbales2
Journeyman III

Memory bandwidth

Would a measure of total L1 cache accesses (hits and misses) be representative of the number of load and store instructions executed in an application?

 

It seems like non-temporal stores could mess up this number, but that isn't a problem here. I was curious if there were any other gotchas that could make this inaccurate.

 

I am looking for total memory instructions executed. I do not care if they are found in L1 or in system memory.

 

Ben

0 Likes
2 Replies
edward_yang
Journeyman III

According to CodeAnalyst documentation, the L1 data cache access event includes all accesses to the data cache for load and store. It may also include some "scratchpad accesses" due to microcoded (vectorpath) instructions, but that should be very rare.

For Athlon 64 or Turion, each count represents an 8-byte access, even if only part of that is transferred. I don't know how that affects the 128-bit loads in Barcelona and Shanghai processors, though. (The CodeAnalyst documentation for family 10 processors seems missing.)

For non-cacheable, streaming store or write-combining accesses, use event 0x065 memory request by type.

0 Likes

Alright, sounds good.

 

Thanks,

Ben

0 Likes