Hello,
On my AMD Opteron 6274 (15h), I have modified MSR to disable the HW prefetcher. According to the BKDG, I have to change the 13th bit of MSRC001_1022 to 1. So I ran
[root@tiger exe]# wrmsr -a 0xc0011022 0x2000
[root@tiger exe]# rdmsr -a -x -0 0xc0011022
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
0000000000002000
As you can see, the prefetcher has been disabled on all 32 cores. Now, when I run ocount command, I see some stats for the prefetcher.
[root@tiger exe]# ocount -e CPU_CLK_UNHALTED,RETIRED_INSTRUCTIONS,DATA_CACHE_ACCESSES,DATA_CACHE_MISSES,DATA_PREFETCHER,PREFETCH_INSTRUCTIONS_DISPATCHED,REQUESTS_TO_L2,L2_CACHE_MISS,L2_PREFETCHER_TRIGGER ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Loading Input Data
Duplicating 13329296 bytes
Input data 67108864 bytes in length
Compressing Input Data, level 5
Compressed data 15115419 bytes in length
Uncompressing Data
Uncompressed data 67108864 bytes in length
Uncompressed data compared correctly
Compressing Input Data, level 7
Compressed data 14615506 bytes in length
Uncompressing Data
Uncompressed data 67108864 bytes in length
Uncompressed data compared correctly
Compressing Input Data, level 9
Compressed data 14448493 bytes in length
Uncompressing Data
Uncompressed data 67108864 bytes in length
Uncompressed data compared correctly
Tested 64MB buffer: OK!
Events were actively counted for 35.7 seconds.
Event counts (scaled) for /home/mahmood/spec-cpu2006-x86_64/exe/bzip2_base.amd64-m64-gcc44-nn:
Event Count % time counted
CPU_CLK_UNHALTED 107,255,826,409 55.54
DATA_CACHE_ACCESSES 55,980,588,486 66.67
DATA_CACHE_MISSES 1,713,225,386 66.65
DATA_PREFETCHER 1,069,985,468 66.66
L2_CACHE_MISS 243,353,144 66.67
L2_PREFETCHER_TRIGGER 239,087,660 55.57
PREFETCH_INSTRUCTIONS_DISPATCHED 78,746 66.68
REQUESTS_TO_L2 2,793,780,634 44.45
RETIRED_INSTRUCTIONS 108,770,366,847 55.56
Why prefetcher stats are non-zero?