Server Processors

dm_1 · ‎02-01-2023

I am currently investigating the performance of some MCMC algorithms on multicores which includes my Ryzen 9 5900x. As I currently understand, this chip has two chiplets each with 6 activated cores and their own 32MB cache. Correct me if I'm wrong.

I use AMDuProfPcm to profile the application performance. I see that it has options for core affinity so I can pin threads to certain cores which is great. However, I am wondering if there is a way to check the L3 cache performance that is shared by those cores.

For example: Let's say I pin 2 threads to Core 0 and Core 1. How do I profile the L3 cache stats for that particular CCX and not the second L3 cache that is connected to Cores 6 - 11.

dipak · ‎02-01-2023

Thanks for your query. I am moving this post to the AMD server gurus community which is the right place for any AMDuProfPcm related query.

Thanks.

santosh_zanjurne · ‎02-02-2023

Hi, AMDuProfPcm is currently supported only on AMD EPYC systems. Please refer chapter 3.1 of uProf user guide for the pre-requisites. On Zen3 systems, CCD and CCX are same. On Zen3 based systems, all the cores in one CCD share one block of l3 cache. You can use below command get the relationship of CCD and Core on your system. Once you have this information you can choose the CCD belonging to the Core you are interested to profile.
Please find example command below to profile l3cache belonging to CCD 0.

#AMDuProfPcm -n
#AMDuProfPcm -m l3 -c ccd=0 -d 10

dm_1 · ‎02-05-2023

Just a follow up, since my cpu has 12 cores, 24 threads, I am not sure how to interpret the output from htop as shown:

Why does it show 24 cpus ? or this 24 threads ?

dm_1 · ‎02-05-2023

Just a follow up, since I have a 12 core 24 thread chip, why does htop show 24 cpus as shown:

For example, I want to use AMDuProfPcm to launch the application on 1, 2, 4, 6 cores on CCX=0. But htop shows that it launches on cpus 0-12. In other words, I am not sure why there are 24 cpus.

santosh_zanjurne · ‎02-06-2023

SMT setting your system BIOS must have been enabled. Check your settings with lscpu linux command.

You should see something like this:

See threads per core is 2. Also, AMDuProfPcm is not supported on client systems.

dm_1 · ‎02-06-2023

Thanks for the help! Since it is not supported, does that mean the measurements I am taking are inaccurate ? If so, what alternatives are available to perform the same function ?

santosh_zanjurne · ‎02-07-2023

For system analysis tool - AMDuProfPcm - we do not support all of the features on client systems. What metric do you see enabled on your system? AMDuProfPcm -h will show you this - only those metrics will work.

dm_1 · ‎02-07-2023

So this is what I got:

AMD Perfomance Counter Monitor
AMDuProfPcm is a command-line tool to monitor CPU performance metrics of AMD processors.
Usage: AMDuProfPcm [<OPTIONS>] -- <PROGRAM> [<ARGS>]

OPTIONS
 -m <METRIC,..>                 Metrics to report.
                                Supported METRIC's are - <dc | fp | ipc | l1 | l2 | l3 | memory | tlb>
                                Note: These metrics are applicable only with the built-in config file.
 -c <core|ccd|package=<n>       Collect from the specified core | ccd | package. Default is 'core=0'.
                                If 'ccd' is specified:
                                    core events will be collected from all the cores of this die.
                                    l3 events will be collected from the first core of all the ccx's of this die.
                                    df events will be collected from the first core of this die.
                                If 'package' is specified:
                                    core events will be collected from all the cores of this package.
                                    l3 events will be collected from the first core of all the ccx's of this package.
                                    df events will be collected from the first core of all the die's of this package.
 -a                             Collect from all the cores. Note: Options -c and -a cannot be used together.
 -i <config file>               User defined XML config file that specifies Core|L3|DF counters to monitor.
                                Refer sample files at <install-dir>/bin/Data/Config/ dir for the format.
                                Note: Options -i and -m cannot be used together. If option -i is used, all the events 
                                      mentioned in the user-defined config file will be collected.
 -d <seconds>                   Profile duration to run.
 -t <multiplex interval in ms>  Interval in which pmc count values will be read. Minimum is 16ms.
 -k                             Prefix 'pkg' in package level counters. Available only for package level counters
 -o <output file>               Output file name.
 -D <dump file>                 Output file that contains the event count dump.
 -p <n>                         Set precision of the metrics reported. Default is 2.
 -q                             Hide CPU topology reported.
 -r                             To force reset the MSRs.
 -s                             Display time stamp in the time series report
 -I <ms>                        Prints the metrics at regular interval. Enabled by default with 1000ms interval
 -C                             Prints the cumulative metrics at the end of the profile duration. Otherwise, all the
                                samples will be reported as timeseries data.
 -A <system,package,ccd,core>
                                Prints aggregated metrics at specified component level
 -l                             List supported Raw PMU events.
 -z <pmc-event>                 Print the name, description and available unit masks for the event.
 -x <core-id,..>                Core affinity for launched application. Comma separated list of core-id's.
 -w <dir>                       Specify the working directory. Default will be the path of the launched application.
 -n                             Print cpu numa topology.
 -v                             Print version.
 -h                             Print help.

PROGRAM                         The application to be launched before collecting the profile data.
ARGS                            The list of arguments for the launch application.

*** pre-release features ***

MONITORING WITHOUT ROOT PRIVILEGES:
    The pre-release option -X can be used to monitor the metrics without having the dependency on "msr" module and root access.
    This option can be used to collect Core, L3 and DF PMC events on Zen based processors. Newer proceesors may require
    latest kernels supporting those newer processor models.

    EXAMPLES:
    Timeseries monitoring of ipc of a benchmark. Aggregate metrics per thread.
      $ AMDuProfPcm -X -m ipc -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Timeseries monitoring of ipc of a benchmark. Aggregate metrics per processor package.
      $ AMDuProfPcm -X -m ipc -A package -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Timeseries monitoring of ipc of a benchmark. Aggregate metrics at the system level.
      $ AMDuProfPcm -X -m ipc -A system -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Cumulative reporting of ipc metrics at the end of the benchmark execution.
      $ AMDuProfPcm -X -m ipc -C -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Cumulative reporting of ipc metrics at the end of the benchmark execution. Aggregate metrics per processor package.
      $ AMDuProfPcm -X -m ipc -C -A package -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Cumulative reporting of ipc metrics at the end of the benchmark execution. Aggregate metrics at system level.
      $ AMDuProfPcm -X -m ipc -C -A system -o /tmp/pcm.csv -- AMDTClassicMatMul-bin

    Timeseries monitoring of memory bandwidth reporting at package and memory channels level.
      $ AMDuProfPcm -X -m memory -a -A system,package -o /tmp/mem.csv

    Timeseries monitoring of level 1 and level 2 topdown metrics (pipeline utilization)
      $ AMDuProfPcm -X -m topdown -A system -o /tmp/td.csv -- AMDTClassicMatMul-bin

    Cumulative reporting of level 1 and level 2 topdown metrics (pipeline utilization)
      $ AMDuProfPcm -X -m topdown -C -A system -o /tmp/td.csv -- AMDTClassicMatMul-bin

    For better topdown results disable NMI watchdog. As root, run "echo 0 > /proc/sys/kernel/nmi_watchdog"

ROOFLINE MODEL:
        An easy to visualize performance model that can be used to characterize a benchmark or application on an architecture.
    This performance model helps to identify whether a benchmark is memory-bound or compute-bound.
    Following steps can be used to generate a roofline graph of an application.

    To collect profile data
       # AMDuProfPcm roofline -o /tmp/myapp-rl.csv -- <PROGRAM> [<ARGS>]

    To collect data in non-root mode
       $ AMDuProfPcm roofline -X -o /tmp/myapp-rl.csv -- <PROGRAM> [<ARGS>]

    To plot roofline
       $ AMDuProfModelling.py -i /tmp/myapp-rl.csv -o /tmp/

    The above command will generate the roofline plot in a PDF file and save it in /tmp/ dir.
    Run "AMDuProfModelling.py -h" for the options supported by this script.

********
EXAMPLES:
  1. Run the command and collect ipc performance metrics and print in regular interval (1000ms) in console
     # AMDuProfPcm -m ipc -c core=0 -- /usr/bin/taskset -c 0 AMDTClassicMatMul-bin

  2. Collect ipc metrics from core 0 for duration of 10 seconds and in regular interval save profile data in a file
     # AMDuProfPcm -m ipc -c core=0 -d 10 -o /tmp/perf-overview.csv

  3. Collect memory bandwidth for all the memory channels and save the profile data in a file
     # AMDuProfPcm -m memory -a -d 10 -o /tmp/perf-overview.csv

santosh_zanjurne · ‎02-07-2023

Hi,

Can you share the output of lscpu command? I do see memory and l3 metrics are supported. Do you think we can meet online an discuss? Just drop an email to toolchainsupport@amd.com to take this further. Thanks.

dm_1 · ‎02-08-2023

sure, email sent !

Server Processors

L3 cache profiling on Ryzen 5900x