Please use uProf (v1.1) and use AMDuProfCLI instead of AMDCpuProfiler (this is deprecated in v1.1).
Use event=0xb006 for 'L3 Miss' event.
List of supported events can be displayed using the command: ./AMDuProfCLI collect --list cpu-events
You need to combine L3 event along with TBP profiling. Here is a sample command to collect "L3 miss" events.
To collect L3 Miss samples in SWP mode:
./AMDuProfCLI collect -e event=timer,interval=1 -e event=0xb006,umask=0x01,slicemask=0xF,threadmask=0xFF -a -d 10 -o /tmp/out
I'm trying to do something similar -- profile my application for L3 caching issues on my Threadripper 1950x CPU on Windows 10.
Using AMDuProfCLI, the sample command you gave seems to work, but I can't seem to inspect or analyze the collected samples in any useful way.
The only way I can tell anything was collected is to inspect the generated .CSV file which has a section that looks like:
L3/DF PROFILE REPORT
TimeStamp,L3 miss(CCX0),L3 miss(CCX1),L3 miss(CCX2),L3 miss(CCX3)
Is there any way to connect these events back to specific threads and instructions?
As of now only chronological L3 events are being reported. On family 0x17 processors, if SMT is enabled, 8 threads share single L3 cache resource within a CCX. Existing L3 events don't provide (software) threads or instructions attribution information. But you may restrict L3 events to a specific (hardware) thread or core. Each bit in threadmask corresponds to a core within CCX. When threadmask set to 0xFF, it collects L3 events for all threads. You can set it to a specific core, and set your application affinity to that core. This way you might get some useful information regarding L3 events.
Thanks Swarup, it does work! Do you know if there is any way I can collect memory bandwidth data while I run my application?
Memory bandwidth profiling is not yet supported by uProf.