cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mah
Adept I

Modifying MSR to disable the prefetcher

Hello,

On my AMD Opteron 6274 (15h), I have modified MSR to disable the HW prefetcher. According to the BKDG, I have to change the 13th bit of MSRC001_1022 to 1. So I ran

[root@tiger exe]# wrmsr -a 0xc0011022 0x2000

[root@tiger exe]# rdmsr -a -x -0 0xc0011022

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

0000000000002000

As you can see, the prefetcher has been disabled on all 32 cores. Now, when I run ocount command, I see some stats for the prefetcher.

[root@tiger exe]# ocount -e CPU_CLK_UNHALTED,RETIRED_INSTRUCTIONS,DATA_CACHE_ACCESSES,DATA_CACHE_MISSES,DATA_PREFETCHER,PREFETCH_INSTRUCTIONS_DISPATCHED,REQUESTS_TO_L2,L2_CACHE_MISS,L2_PREFETCHER_TRIGGER ./bzip2_base.amd64-m64-gcc44-nn

spec_init

Loading Input Data

Duplicating 13329296 bytes

Input data 67108864 bytes in length

Compressing Input Data, level 5

Compressed data 15115419 bytes in length

Uncompressing Data

Uncompressed data 67108864 bytes in length

Uncompressed data compared correctly

Compressing Input Data, level 7

Compressed data 14615506 bytes in length

Uncompressing Data

Uncompressed data 67108864 bytes in length

Uncompressed data compared correctly

Compressing Input Data, level 9

Compressed data 14448493 bytes in length

Uncompressing Data

Uncompressed data 67108864 bytes in length

Uncompressed data compared correctly

Tested 64MB buffer: OK!

Events were actively counted for 35.7 seconds.

Event counts (scaled) for /home/mahmood/spec-cpu2006-x86_64/exe/bzip2_base.amd64-m64-gcc44-nn:

        Event                                    Count                    % time counted

        CPU_CLK_UNHALTED                         107,255,826,409          55.54

        DATA_CACHE_ACCESSES                      55,980,588,486           66.67

        DATA_CACHE_MISSES                        1,713,225,386            66.65

        DATA_PREFETCHER                          1,069,985,468            66.66

        L2_CACHE_MISS                            243,353,144              66.67

        L2_PREFETCHER_TRIGGER                    239,087,660              55.57

        PREFETCH_INSTRUCTIONS_DISPATCHED         78,746                   66.68

        REQUESTS_TO_L2                           2,793,780,634            44.45

        RETIRED_INSTRUCTIONS                     108,770,366,847          55.56

Why prefetcher stats are non-zero?

0 Likes
1 Reply
mah
Adept I

Here is my findings with the AMD's prefetcher.

According to the BKDG, there are two MSRs for that:

1) The MSR on page 591 which is MSRC001_102B Combined Unit Configuration 3 (CU_CFG3). Bit #18 has been described as “PfcDis. Read-write. Reset: 0. 1=Prefetcher disabled”
The default value is 0 so I flip it. Now I see zero stats

Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':

    55,860,447,518 L1-dcache-loads:uk

0 L1-dcache-prefetches:uk

0 L1-dcache-prefetch-misses:uk

  36.372604375 seconds time elapsed

2) MSRC001_1022 which I described in the previous post.

Still I have a question that what is the difference between these two MSR's? I think the first one (MSRC001_102B) is related to the L2 prefetcher, but I am not sure. Thanks for any reply...


0 Likes