cancel
Showing results for 
Search instead for 
Did you mean: 

Server Gurus Discussions

izhukov
Adept I

AMDuProf crashes at report generation

I successfully profiled MPI application (8 MPI ranks) with  AMDuProf_Linux_x64_3.3.462

> srun -n 8 AMDuProfCLI collect --config assess --mpi --output-dir amdprof_out_jusuf bt-mz.C.8

But summarization crashed

> AMDuProfCLI-bin report --detail --verbose 2 -i amdprof_out_jusuf/*

> Translation started ...
> [TRANSLATION PROGRESS] 100% Translation done...
> Translation done...
> Report generation started ...
> Generating report file...
>
> terminate called after throwing an instance of 'std::system_error'
> what(): Resource deadlock avoided
> Aborted

Do you know how to solve this issue?

0 Likes
15 Replies

Users using uProf needs to go here : https://community.amd.com/t5/newcomers-start-here/bd-p/newcomer-forum to get access to AMD Server Gurus where this specific program/software is moderated: https://community.amd.com/t5/server-gurus/ct-p/amd-server-gurus

 

Yes, for AMD uProf related support,  AMD Server Gurus community is the best place to post any query/issue.  I'm moving this post there.

Just to clarify one point, AMD Server Gurus is not part of Devgurus community and it is independently moderated. 

0 Likes


@dipak wrote:

Yes, for AMD uProf related support,  AMD Server Gurus community is the best place to post any query/issue.  I'm moving this post there.

Just to clarify one point, AMD Server Gurus is not part of Devgurus community and it is independently moderated. 


Thanks for clarification. I do not see my original question here. Should I post it again or copy/paste here? Sorry for stupid question Smiley Happy

0 Likes

Thank you for that information. I wasn't sure if the OP needed to get whitelisted to go to Server Gurus since once another User mentioned he couldn't get access a while back.

0 Likes
swarup
Staff

Hi @izhukov,

Please use the following command to generate the report for MPI application profiling and let us know if it resolves the issue.

 AMDuProfCLI report --detail --input-dir amdprof_out_jusuf

0 Likes


@swarup wrote:

Hi @izhukov,

Please use the following command to generate the report for MPI application profiling and let us know if it resolves the issue.

 AMDuProfCLI report --detail --input-dir amdprof_out_jusuf


I ran application again

> srun -n 8 AMDuProfCLI collect --config assess --mpi --output-dir amdprof_out_jusuf bt-mz.C.8

It was successful with following additional output from the profiler

> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30958.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30961.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30959.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30570.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30573.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30572.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc177-Dec-11-2020_08-24-36-30960.caperf
> Profile started ...
> Profile completed ...
> Generated raw file : amdprof_out_jusuf/AMDuProf-jsfc176-Dec-11-2020_08-24-36-30571.caperf

Unfortunately suggested command failed too

> AMDuProfCLI report --detail ./amdprof_out_jusuf
> ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
> Report generation started ...
>
> ERROR: Report Generation Failed...

I do measurements on compute node, but report generation is on login node.

 

 
0 Likes

Hi @izhukov ,

Looks like "--input-dir" option was missing. Please try with "--input-dir" option. If you are using different node for compute and login, you may need to use "--host" option.

Example-

$ AMDuProfCLI report --detail --host all --input-dir ./amdprof_out_jusuf 

For more details on "--host" option for report generation, please refer "User Guide", section 7.3.2

https://developer.amd.com/wordpress/media/files/AMDuprof_Resources/User_Guide_AMD_uProf_v3.3_GA.pdf

 

0 Likes


@swarup wrote:

Hi @izhukov ,

Looks like "--input-dir" option was missing. Please try with "--input-dir" option. If you are using different node for compute and login, you may need to use "--host" option.

Example-

$ AMDuProfCLI report --detail --host all --input-dir ./amdprof_out_jusuf 

For more details on "--host" option for report generation, please refer "User Guide", section 7.3.2

https://developer.amd.com/wordpress/media/files/AMDuprof_Resources/User_Guide_AMD_uProf_v3.3_GA.pdf

 


+ srun -n 8 ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI collect --config assess --mpi --output-dir./amdprof_jureca_1n ./bt-mz.C.8
+ ./AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI report --detail --input-dir ./amdprof_jureca_1n
Generating report file...

terminate called after throwing an instance of 'std::system_error'
what(): Resource deadlock avoided

Still the same error on login and on compute nodes. "host " option didn't help. There is a core file which is useless as AMDuProf was not build with source code information.

0 Likes

Hi @izhukov,

It would be helpful of you can provide the following information.

  1. Which Linux based OS is used here? Also the share the version details.
  2. Which compiler and the version is used here? 
  3. We are also trying to create a local setup to generate the issue. We assume you are using NPB BT-MZ application. Are you using the 3.4.1-MZ version? Any specific compiler flags/options used?
0 Likes

Hi @swarup,

thank you for your assistance.

Here is requested information

  1. CentOS 8
  2. GCC/9.3.0+ParaStationMPI/5.4.7-1 and Intel/2020.2.254-GCC-9.3.0+ParaStationMPI/5.4.7-1
  3. For my tests I use NPB3.3-MZ-MPI ( bt-mz CLASS=C NPROCS=8 ), and following flags "-O -fopenmp" for GCC and "-O -qopenmp" for Intel

Here is the link to measurements/stdout/stderr/batch script

> https://gigamove.rz.rwth-aachen.de/d/id/izALLuqh67mye2

 

 
0 Likes

Hi @izhukov

Thank you for providing the details. We will get back to you with our findings.

0 Likes

Hi @izhukov,

We analyzed the issue. The returned value of one of the ParastationMPI API (to get the env variable value) caused the issue. The issue is only limited to uProf command line, whereas the uProf GUI application works fine. We have the fix for this and should be available in the next release of uProf.  Mean the time, you may use to view the report using the GUI. After translation, you should see a .db file in the output directory. Launch the GUI, go to HOME > 'Import Session' > 'Import Profile Session' > 'Profile Data File' > Browse and choose the .db file > Press 'Open Session' button. This should generate report in the GUI.

Let us know if this works for you.

Hi @swarup,

thank you for providing a workaround. And I'm looking forward to a new release.

I do not think it is related to ParaStationMPI, as it crashes with IntelMPI and OpenMPI too. I can provide error logs if you wish.

I have additional questions regarding GUI usage.  I understand that it is out of scope of this post, but it is still related to the same measurements and the same setup. Let me know if it is better to create a new post for these questions.

Here are the questions (GCC+ParaStationMPI testcase)

1) I do not see MPI routines called from user code, although "--mpi" was enabled. Are they intercepted?

2) I do not see OpenMP as I do not compile with Clang. Do you plan to change it in the future and enable it with other compilers?

3) "-g" flag was provided to AMDuProfCLI to enable call graph, but it is empty (see picture). "adi_" should include many others functions.

uprof.png

 
0 Likes

Hi @izhukov ,

We could not observe the crash using OpenMPI. It would be helpful if you can share the error logs for IntelMPI and OpenMPI for analysis. Regarding your other queries:

1 & 3) Please use '--call-graph fpo:512' option with 'collect' command instead of '-g' to get a better callstack. User guide will have more info regarding '--call-graph' option.

2) GCC 10 does not support OpenMP 5.0 completely. As soon as the next release of GCC comes with the required support, we will enable it for GCC as well. Right now, OpenMP tracing is only supported on single node. In the next release we will be supporting on multi-node setup.

0 Likes

Hi @swarup,

thanks for prompt reply.

Please see error logs here (in filename suffix first letter 'i' stands for Intel compiler and second one stands for MPI implementation i=IntelMPI, o=OpenMPI). I noticed that crash happens with "assess" and completes successfully for "tbp" and OpenMPI.

'--call-graph fpo:512' option helps to see user functions in callgraph/flame graph, but there are no MPI in the calls. Is there any way to sort columns in the callgraph table like in metric pane?

0 Likes