Anyone know how to count the instructions that are tagged during a sample run? More particularly, within some subset of the code?
I'd like to look at some hot spot and get an idea of how many cycles were spent performing those instructions.
If I look at load latency, that can indicate code that is memory bound. Or the average cycles tag-to-completion. But if both of these are fairly low, and the code appears to be CPU bound, then how would I find out how fast it is executing?