- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
AMDuProfCLI hangs during report generation
I am running on AlmaLinux on an Azure VM with EPYC 7V73X cpus.
I am trying to profile an MPI application and generate session data so that I can import into the GUI for further analysis.
The collect phase goes well and completes without issue. I use the following bash commands
mpirun -machinefile ./hostfile.txt -np 120 $PIN_PROCESSOR_LIST --rank-by slot -mca coll ^hcoll -x LD_LIBRARY_PATH -x PATH -x PWD $PROF_CMD ./MonteCarlo
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Because you are profiling all the ranks, uProf collects the data for all the ranks, which may be huge in size. This increases the processing overhead. Refer chapter 8.2 of AMD uProf User Guide for profiling single rank - and let me if you still see the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Because you are profiling all the ranks, uProf collects the data for all the ranks, which may be huge in size. This increases the processing overhead. Refer chapter 8.2 of AMD uProf User Guide for profiling single rank - and let me if you still see the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
santosh,
I see the same issue on my system. Profiling a single rank is not a solution as that prevents seeing load imbalance among the ranks.
In my case, I let the "AMDuProfCLI report " command run for more than 90 minutes without seeing any progress. One the other hand, the GUI was able to process the same data in only a few minutes and present the results. Can you explain the different behavior?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Dan, Profiling MPI application using GUI is not recommended. You can profile all the ranks using CLI. In order to understand the issue - can you try profiling with smaller input size and fewer ranks for your application? This is to reduce the profile data size processing overhead. Make sure to use --mpi flag. Do share your command if you still see the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Santosh,
To be clear, I collected all the data using the CLI. The CLI takes a long time to generate the report. The GUI only takes a few minutes to process the collected data and present it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dan,
Can you share your mpi command line ? Also output of :
AMDuProfCLI.exe info --system
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using OpenMPI-4.1.4. OpenMPI and the application code are compiled with Intel compilers.
mpiexec --bind-to core -np 48 --report-bindings AMDuProfCLI collect --config tbp --mpi --output-dir Profile.uProf ./fv3.exe >& out
AMDuProfCLI info --system
/home/dkokron/play/AMDuProf/AMDuProf_Linux_x64_4.0.341/bin/AMDuProfCLI
[OS Info]
OS Details : LinuxUbuntu 22.04.1 LTS-64
Distribution Details : debian 22.04
Kernel Details : 5.15.0
[CPU Info]
AMD Cpu : Yes
Family : 0x17
Model : 0x31
Stepping : 0x0
Local APIC : Yes
Socket Count : 2
SMT Enabled : No
Threads per Core : 1
Threads per CCX : 3
Threads per Package : 24
Total number of Threads : 48
[PERF Features Availability]
Core PMC : Yes
L3 PMC : Yes
DF PMC : Yes
PERF TS : No
[IBS Features Availability]
IBS : Yes
IBS Fetch Sampling : Yes
IBS OP Sampling : Yes
IBS FetchCtlExtd : Yes
IBS ExtCount : Yes
IBS Dispatch : Yes
IBS BrTgtAddr : Yes
IBS OpData4 : No
[RAPL/CEF Features Availability]
RAPL : Yes
APERF & MPERF : Yes
Read Only APERF & MPERF : Yes
IRPERF : Yes
HW P-State Control : Yes
[PERF features supported by OS]
TBP Supported : Yes
EBP Supported : Yes
IBS Supported : Yes
IRPERF Supported : Yes
APERF Supported : Yes
MPERF Supported : Yes
BPF Supported : No
BCC Installed : No
Perf Event Paranoid : -1
Perf Event Max Mlock : 516 KB
Perf Event Max Stack : 127
[Hypervisor Info]
Hypervisor Enabled : No
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working on this. Will get back to you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
.