cancel
Showing results for 
Search instead for 
Did you mean: 

Server Gurus Discussions

Highlighted
Journeyman III
Journeyman III

AMD uProf with MPI application

I want to profile the MPI application using AMD uProf on AMD EPYC 7452 server.

This is an example when running the stream benchmark (MPI version, C language). 

mpirun -np 8 /opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI collect --config tbp --mpi --output-dir ~/stream/stream_mpi/tmp ./stream_flatmpi_c.exe

However, as shown below, the profile starts after finishing the program.

/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI
/opt/AMDuProf_Linux_x64_3.3.462/bin/AMDuProfCLI

"Program excution result"

Profile started ...
Profile completed ...
Generated raw file : /../../../AMDuProf-epyc-Sep-15-2020_16-31-31-145050.caperf

Profile started ...
Profile completed ...
Generated raw file : /../../../AMDuProf-epyc-Sep-15-2020_16-31-31-145051.caperf

...

Do you have any solutions?

Tags (3)
0 Kudos
1 Reply
Highlighted
Staff
Staff

Re: AMD uProf with MPI application

Hi epyc-beginner 

We need more clarity on this issue. 

Like what is the array size you using for this and you are using flat mpi of stream so what it actually means ? you are running stream on all cores using only mpi ranks ?

if you can share the complete log and commands what you have used then it will be really helpful.

I tried with the stream_mpi version and for me it worked fine. 

mpirun -np 8 /opt/AMDuProf_3.3-462/bin/AMDuProfCLI collect --config tbp --mpi -O /tmp/st-temp10 /media/amd/proj/Benchmarks/Stream/stream_mpi-gcc
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
/opt/AMDuProf_3.3-462/bin/AMDuProfCLI
Profile started ...
Profile started ...
Profile started ...
Profile started ...
Profile started ...
Profile started ...
Profile started ...
Profile started ...
-------------------------------------------------------------
STREAM version $Revision: 1.8 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Total Aggregate Array size = 2048000000 (elements)
Total Aggregate Memory per array = 15625.0 MiB (= 15.3 GiB).
Total Aggregate memory required = 46875.0 MiB (= 45.8 GiB).
Data is distributed across 8 MPI ranks
   Array size per MPI rank = 256000000 (elements)
   Memory per array per MPI rank = 1953.1 MiB (= 1.9 GiB).
   Total memory per MPI rank = 5859.4 MiB (= 5.7 GiB).
-------------------------------------------------------------
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Number of Threads requested for each MPI rank = 64
Number of Threads counted for rank 0 = 64
-------------------------------------------------------------
Your timer granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 64885 microseconds.
   (= 64885 timer ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 timer ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:         139324.3     0.241246     0.235192     0.243996
Scale:         92568.1     0.356596     0.353988     0.361718
Add:          104197.8     0.474601     0.471718     0.478712
Triad:        105259.3     0.474221     0.466961     0.481544
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215259.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215260.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215255.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215256.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215258.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215262.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215261.caperf
Profile completed ...
Generated raw file : /tmp/st-temp10/AMDuProf-uprof-ethanolx2-Sep-17-2020_10-52-56-215257.caperf
0 Kudos