Could anyone please help me decipher what exactly the error means and how I may go about solving it? I'm currently running AMDuProf remotely using SSH and it has got an AMD EPYC 7473X 24-Core Processor if that is of any use.
The main goal I'm trying to achieve is to create a time series report at a fixed interval of all the metrics available from the CPU (Ranging from Frequency/Temp down to the Cache Hit/Miss rate and other Specific Metadata provided by the CPU). Is using uProf Timechart the correct way to go about it? (Just confirming what I'm doing before debugging an issue which may end up not being relevant if there's a better metric logging method available out there)
Hello FortemDave,
I guess there is some confusion between time-series data and timechart command. You can use system analysis tool AMDuProfPcm to collect timeseries data using various metrics provided e.g. l1, l2, l3, memory, pcie.
You can use application analysis tool AMDuProf, in both CLI and GUI mode, to collect Power/Frequency/Temperature/P-State data. These metrics are part of timechart command.
Please refer last line on page 54 regarding multiplexing. Let me know if this helps. You can drop an email at toolchainsupport at amd dot com and we can discuss more.
Hey! Thank You for your reply.
Just wanted to query whether it's possible to somehow capture both of these metrics of different commands at once? (i.e. uProfPCM and CLI running at once or something that directly captures both of these together). Thank You.
Hi FortemDave, It is not recommended to use more than one profile session at a time as it may lead to unexpected behavior.
So how do I go about capturing multiple metrics at once? Running two PCM Timechart commands at once has worked just fine so far. Once my CLI issue gets fixed i'll see if it works. If not i'll have to look for another solution ASAP as i
Regarding the error, have you installed Power profiling driver? It is documented on page 24 - "Installing Power Profiling Driver on Linux".
Yes I've tried it to no avail. Any other solution/log that I can submit which might help you find the issue?
Can you please share the output of the below commands?
AMDuProfCLI info --system
AMDuProfCLI timechart --list
AMDuProfCLI timechart --event power --interval 100 --duration 10
AMDuProfCLI timechart --event Temperature --interval 100 --duration 10
Sure:
[OS Info]
OS Details : LinuxUbuntu 20.04.6 LTS-64
Distribution Details : debian 20.04
Kernel Details : 5.15.0
[CPU Info]
AMD Cpu : Yes
Family : 0x19
Model : 0x01
Stepping : 0x2
Local APIC : Yes
Socket Count : 2
SMT Enabled : Yes
Threads per Core : 2
Threads per CCX : 6
Threads per Package : 48
Total number of Threads : 96
[PERF Features Availability]
Core PMC : Yes
L3 PMC : Yes
DF PMC : Yes
PERF TS : No
[IBS Features Availability]
IBS : Yes
IBS Fetch Sampling : Yes
IBS OP Sampling : Yes
IBS FetchCtlExtd : Yes
IBS ExtCount : Yes
IBS Dispatch : Yes
IBS BrTgtAddr : Yes
IBS OpData4 : No
[RAPL/CEF Features Availability]
RAPL : Yes
APERF & MPERF : Yes
Read Only APERF & MPERF : Yes
IRPERF : Yes
HW P-State Control : Yes
[PERF features supported by OS]
TBP Supported : Yes
EBP Supported : Yes
IBS Supported : Yes
IRPERF Supported : Yes
APERF Supported : Yes
MPERF Supported : Yes
BPF Supported : No
BCC Installed : No
Perf Event Paranoid : -1
Perf Event Max Mlock : 516 KB
Perf Event Max Stack : 127
[Hypervisor Info]
Hypervisor Enabled : No
Timechart-List output--------------------------------------------------------------
Supported Devices:-
Device Name Instance
----------- --------
Socket [ 0 - 1 ]
Core [ 0 - 47 ]
Thread [ 0 - 95 ]
Supported Counter Categories:-
Category Supported Device Type
-------- ---------------------
Power [ Socket, Core ]
Frequency [ Thread ]
Temperature [ Socket ]
P-State [ Thread ]
While I run the Power and Temperature command, the millisecond counter gets stuck somewhere in the middle of running..? (Could be due to the load on the server but i've checked using htop the cpu is mostly idling. So i'm not sure why the logging gets interrupted in between)
The Frequency doesn't show the millisecond counter and outputs a CSV with just the metadata. No values are logged so the tables are blank.
P-State logging gives the error "Could not enable the counters"
There's that.
Hi @FortemDave
Thank you for sharing the details.
Will check on the same and update you.
Hello @FortemDave
Currently we do not have the system with the exact specifications mentioned by you and hence are unable to reproduce this issue.
Do you think we can meet online and discuss?
Please mail to us on toolchainsupport@amd.com and we can check further on the same.