I successfully profiled MPI application (8 MPI ranks) with AMDuProf_Linux_x64_3.3.462
> srun -n 8 AMDuProfCLI collect --config assess --mpi --output-dir amdprof_out_jusuf bt-mz.C.8
But summarization crashed
> AMDuProfCLI-bin report --detail --verbose 2 -i amdprof_out_jusuf/*
> Translation started ...
> [TRANSLATION PROGRESS] 100% Translation done...
> Translation done...
> Report generation started ...
> Generating report file...
>
> terminate called after throwing an instance of 'std::system_error'
> what(): Resource deadlock avoided
> Aborted
Do you know how to solve this issue?
Users using uProf needs to go here : https://community.amd.com/t5/newcomers-start-here/bd-p/newcomer-forum to get access to AMD Server Gurus where this specific program/software is moderated: https://community.amd.com/t5/server-gurus/ct-p/amd-server-gurus
Yes, for AMD uProf related support, AMD Server Gurus community is the best place to post any query/issue. I'm moving this post there.
Just to clarify one point, AMD Server Gurus is not part of Devgurus community and it is independently moderated.
@dipak wrote:Yes, for AMD uProf related support, AMD Server Gurus community is the best place to post any query/issue. I'm moving this post there.
Just to clarify one point, AMD Server Gurus is not part of Devgurus community and it is independently moderated.
Thanks for clarification. I do not see my original question here. Should I post it again or copy/paste here? Sorry for stupid question
Thank you for that information. I wasn't sure if the OP needed to get whitelisted to go to Server Gurus since once another User mentioned he couldn't get access a while back.
Hi @izhukov,
Please use the following command to generate the report for MPI application profiling and let us know if it resolves the issue.
AMDuProfCLI report --detail --input-dir amdprof_out_jusuf
@swarup wrote:Hi @izhukov,
Please use the following command to generate the report for MPI application profiling and let us know if it resolves the issue.
AMDuProfCLI report --detail --input-dir amdprof_out_jusuf
I ran application again
> srun -n 8 AMDuProfCLI collect --config assess --mpi --output-dir amdprof_out_jusuf bt-mz.C.8
It was successful with following additional output from the profiler
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30958.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30961.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30959.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30570.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30573.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30572.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30960.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30571.caperf
Unfortunately suggested command failed too
> AMDuProfCLI report --detail ./amdprof_out_jusuf
> ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
> Report generation started ...
>
> ERROR: Report Generation Failed...
I do measurements on compute node, but report generation is on login node.
Hi @izhukov ,
Looks like "--input-dir" option was missing. Please try with "--input-dir" option. If you are using different node for compute and login, you may need to use "--host" option.
Example-
$ AMDuProfCLI report --detail --host all --input-dir ./amdprof_out_jusuf
For more details on "--host" option for report generation, please refer "User Guide", section 7.3.2
https://developer.amd.com/wordpress/media/files/AMDuprof_Resources/User_Guide_AMD_uProf_v3.3_GA.pdf
@swarup wrote:Hi @izhukov ,
Looks like "--input-dir" option was missing. Please try with "--input-dir" option. If you are using different node for compute and login, you may need to use "--host" option.
Example-
$ AMDuProfCLI report --detail --host all --input-dir ./amdprof_out_jusuf
For more details on "--host" option for report generation, please refer "User Guide", section 7.3.2
https://developer.amd.com/wordpress/media/files/AMDuprof_Resources/User_Guide_AMD_uProf_v3.3_GA.pdf
+ srun -n 8 ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI collect --config assess --mpi --output-dir./amdprof_jureca_1n ./bt-mz.C.8 + ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI report --detail --input-dir ./amdprof_jureca_1n Generating report file... terminate called after throwing an instance of 'std::system_error' what(): Resource deadlock avoided
Still the same error on login and on compute nodes. "host " option didn't help. There is a core file which is useless as AMDuProf was not build with source code information.
Hi @izhukov,
It would be helpful of you can provide the following information.
Hi @swarup,
thank you for your assistance.
Here is requested information
Here is the link to measurements/stdout/stderr/batch script
> https://gigamove.rz.rwth-aachen.de/d/id/izALLuqh67mye2
Hi @izhukov,
Thank you for providing the details. We will get back to you with our findings.
Hi @izhukov,
We analyzed the issue. The returned value of one of the ParastationMPI API (to get the env variable value) caused the issue. The issue is only limited to uProf command line, whereas the uProf GUI application works fine. We have the fix for this and should be available in the next release of uProf. Mean the time, you may use to view the report using the GUI. After translation, you should see a .db file in the output directory. Launch the GUI, go to HOME > 'Import Session' > 'Import Profile Session' > 'Profile Data File' > Browse and choose the .db file > Press 'Open Session' button. This should generate report in the GUI.
Let us know if this works for you.
Hi @swarup,
thank you for providing a workaround. And I'm looking forward to a new release.
I do not think it is related to ParaStationMPI, as it crashes with IntelMPI and OpenMPI too. I can provide error logs if you wish.
I have additional questions regarding GUI usage. I understand that it is out of scope of this post, but it is still related to the same measurements and the same setup. Let me know if it is better to create a new post for these questions.
Here are the questions (GCC+ParaStationMPI testcase)
1) I do not see MPI routines called from user code, although "--mpi" was enabled. Are they intercepted?
2) I do not see OpenMP as I do not compile with Clang. Do you plan to change it in the future and enable it with other compilers?
3) "-g" flag was provided to AMDuProfCLI to enable call graph, but it is empty (see picture). "adi_" should include many others functions.
Hi @izhukov ,
We could not observe the crash using OpenMPI. It would be helpful if you can share the error logs for IntelMPI and OpenMPI for analysis. Regarding your other queries:
1 & 3) Please use '--call-graph fpo:512' option with 'collect' command instead of '-g' to get a better callstack. User guide will have more info regarding '--call-graph' option.
2) GCC 10 does not support OpenMP 5.0 completely. As soon as the next release of GCC comes with the required support, we will enable it for GCC as well. Right now, OpenMP tracing is only supported on single node. In the next release we will be supporting on multi-node setup.
Hi @swarup,
thanks for prompt reply.
Please see error logs here (in filename suffix first letter 'i' stands for Intel compiler and second one stands for MPI implementation i=IntelMPI, o=OpenMPI). I noticed that crash happens with "assess" and completes successfully for "tbp" and OpenMPI.
'--call-graph fpo:512' option helps to see user functions in callgraph/flame graph, but there are no MPI in the calls. Is there any way to sort columns in the callgraph table like in metric pane?