My MI300A system is currently running with the following configuration:
- Ubuntu 22.04
- Kernel 5.15.0-124-generic
- ROCm 6.2.2
- gfx942 architecture
- AMDuProf 5.0.1174
When executing AMDuProfCLI, I encountered an issue with AMDuProfCLI-bin. The program was abruptly terminated (SIGABRT) due to a std::out_of_range exception. This occurred in the std::map::at method when attempting to access a key that doesn’t exist in the m_gpuAgentAttributeInfo map. According to the backtrace, the error happened as the program tried to access the key 'gfx942' in the map. This suggests that the current version of AMDuProf 5.0 does not support GPUs with the 'gfx942' architecture. However, this is unusual, as AMD’s official website states that AMDuProf 5.0 supports the MI300A GPU, and I am using the latest version available.
root@R1:/mnt/software/AMDuProf_Linux_x64_5.0.1479/bin# gdb /mnt/software/AMDuProf_Linux_x64_5.0.1479/bin/AMDuProfCLI-bin
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/software/AMDuProf_Linux_x64_5.0.1479/bin/AMDuProfCLI-bin...
(gdb) run --help
Starting program: /mnt/software/AMDuProf_Linux_x64_5.0.1479/bin/AMDuProfCLI-bin --help
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3182640 (LWP 607178)]
[Detaching after vfork from child process 607179]
[New Thread 0x7ffff21ff640 (LWP 607183)]
[New Thread 0x7ffff19fe640 (LWP 607184)]
[Thread 0x7ffff19fe640 (LWP 607184) exited]
[Thread 0x7ffff21ff640 (LWP 607183) exited]
[Detaching after fork from child process 607185]
terminate called after throwing an instance of 'std::out_of_range'
what(): map::at
Thread 1 "AMDuProfCLI-bin" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737283914368) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737283914368) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737283914368) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737283914368, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff60b4476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff609a7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff6444b9e in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff645020c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff6450277 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007ffff64504d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x00007ffff64474a0 in std::__throw_out_of_range(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007ffff5fd411c in std::map<std::string, GpuFamilyConstantAttributes, std::less<std::string>, std::allocator<std::pair<std::string const, GpuFamilyConstantAttributes> > >::at (
this=0x7ffff5fe8980 <m_gpuAgentAttributeInfo>, __k="gfx942") at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_map.h:548
#11 AMDTGpuUtils::GetGpuAllAgentInfo (this=<optimized out>)
at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CommonProfileLibs/AMDSysUtils/src/Linux/AMDTGpuUtils.cpp:423
#12 0x00007ffff5fd4b84 in AMDTGpuUtils::GetInstance ()
at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CommonProfileLibs/AMDSysUtils/src/Linux/AMDTGpuUtils.cpp:54
#13 0x00007ffff7375f21 in PopulateLocalSystemInfo (info=...)
at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CommonProfileLibs/AMDTBackendUtils/src/AMDTProfileUtils.cpp:1797
#14 0x00007ffff7376839 in fnGetSystemInfo () at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CommonProfileLibs/AMDTBackendUtils/src/AMDTProfileUtils.cpp:1838
#15 fnGetSystemInfo () at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CommonProfileLibs/AMDTBackendUtils/src/AMDTProfileUtils.cpp:1831
#16 0x0000000000436d26 in checkForSupportedPlatformAndOs () at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CLI/AMDuProfCLI/src/AMDTuProfCLI.cpp:498
#17 0x0000000000423d23 in main (argc=2, argv=<optimized out>) at /data/jenkins/workspace/AMDuProf_Linux-ALL-5.0/AMDProfiler/Components/CLI/AMDuProfCLI/src/AMDTuProfCLI.cpp:2056
(gdb)
I would be grateful if anyone who has encountered this issue could share any insights or solutions. Thank you very much.