I'm kind of new to CodeAnalyst and I find that I frequently get results that I don't expect or have difficulty interpreting.
Here's an example. Consider the following little test program:
size_t size = 1000000;
int iters = 1000;
unsigned char *buf = (unsigned char *)malloc (size);
register unsigned char sum = 0;
for (register int i = 0; i < iters; i++)
for (register int j = 0; j < size; j++)
sum += buf[j];
Compiled as follows:
g++ -o simple simple.cpp
(So - no optimization.)
I would expect this to do ony reads, and no (or very few) writes to main memory. In fact, if I look at events 0x6C (reads) and 0x6D (writes), it seems to do about as many reads as writes, if I'm interpreting the results correctly. Hmmm ... Maybe "sum" isn't being put in a register, in spite of the "register" keyword. That's the only theory I have. But I'm not sure that I believe that.
The actual results I got from one run were 7566 for reads, 31897 for writes and 3128 for DRAM accesses - all with a sample period of 10,000. And ... hmmm ... maybe that sample period should be 500,000. But, still ...
Another question: Why is event 0xE0 (DRAM accesses) not equal to the sum of event 0x6C (reads) and 0x6D (writes)?
What I'm ultimately trying to determine is if a real program (not the above test) is bumping up against memory bandwidth limits, but I'm not sure which event or events I should look at. BTW - I have looked at Paul Drongowski's "Basic Performance Measurements ..." document, which is certainly very helpful, but still leaves me with some questions. (Maybe I'm just thick!)