- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2020
12:14 AM
Segfault in clinfo
Hi there,
I've installed the ROCM runtime on my (Debian testing) machine (upstream kernel 5.3.14-1, dual-socket Haswell Xeon) following the instructions. When I run clinfo, I get a segfault. Adding "HSAKMT_DEBUG_LEVEL=7" sheds a bit more light, but not much:
$ HSAKMT_DEBUG_LEVEL=7 /opt/rocm-3.3.0/opencl/bin/x86_64/clinfoacquiring VM for 13bc using 7SVM alt (coherent): 0x1e00000 - 0x100167ffffSVM (non-coherent): 0x1001680000 - 0x3fffffffffFailed to map remapped mmio page on gpu_mem 0[hsaKmtAllocMemory] node 0[hsaKmtMapMemoryToGPU] address 0x1e08000[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e01000 flags 0x40 size 0x1000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e01000 number of nodes 1[hsaKmtAllocMemory] node 2[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e14000 flags 0x40 size 0x2000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e14000 number of nodes 1[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e04000 flags 0x1040 size 0x1000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e04000 number of nodes 1[1] 1726631 segmentation fault (core dumped) HSAKMT_DEBUG_LEVEL=7 /opt/rocm-3.3.0/opencl/bin/x86_64/clinfo
The backtrace from gdb is not very informative, other than placing the crash somewhere inside the OpenCL runtime:
Thread 1 "clinfo" received signal SIGSEGV, Segmentation fault.__GI___libc_free (mem=0x437265776f500403) at malloc.c:31023102 malloc.c: No such file or directory.(gdb) bt#0 __GI___libc_free (mem=0x437265776f500403) at malloc.c:3102#1 0x00007ffff7a1ef65 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#2 0x00007ffff7a23737 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#3 0x00007ffff79ec4bf in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#4 0x00007ffff79e7096 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#5 0x00007ffff79b9b15 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#6 0x00007ffff7b37e39 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#7 0x00007ffff79b9c4c in clIcdGetPlatformIDsKHR () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#8 0x00007ffff7e4dae3 in ?? () from /lib/x86_64-linux-gnu/libOpenCL.so.1#9 0x00007ffff7e4e4e3 in clGetPlatformIDs () from /lib/x86_64-linux-gnu/libOpenCL.so.1#10 0x000000000040cdd1 in ?? ()#11 0x0000000000403b8c in ?? ()#12 0x00007ffff7c70e0b in __libc_start_main (main=0x403aa0, argc=1, argv=0x7fffffffe688, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe678) at ../csu/libc-start.c:308#13 0x000000000040c1fe in ?? ()
What can I do to troubleshoot this?
AMD Device from lspci:
81:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev cb) 81:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Fiji HDMI/DP Audio [Radeon R9 Nano / FURY/FURY X]
Device nodes:
$ ls -lah /dev/kfd /dev/dri /dev/shm andreask_work@stout 0:16crw-rw---- 1 root gpu 242, 0 Dec 2 11:18 /dev/kfd/dev/dri:total 0drwxr-xr-x 3 root root 120 Dec 2 11:18 .drwxr-xr-x 19 root root 3.5K Dec 14 07:00 ..drwxr-xr-x 2 root root 100 Dec 2 11:18 by-pathcrw-rw---- 1 root gpu 226, 0 Dec 2 11:18 card0crw-rw---- 1 root gpu 226, 1 Dec 2 11:18 card1crw-rw---- 1 root gpu 226, 128 Dec 2 11:18 renderD128/dev/shm:total 8.0Kdrwxrwxrwt 2 root root 80 May 26 14:19 .drwxr-xr-x 19 root root 3.5K Dec 14 07:00 ..-rw-rw-r-- 1 root gpu 8 May 27 00:13 hsakmt_shared_mem-rw-rw-r-- 1 root gpu 32 Dec 7 22:23 sem.hsakmt_semaphore
Thanks,
Andreas
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2020
11:51 AM
Thank you for reporting this.
ROCm related support is provided at the ROCm GitHub site itself. Please post the issue here: Issues · RadeonOpenCompute/ROCm · GitHub
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2020
08:12 PM
