cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

inducer77
Adept II

Segfault in clinfo

Hi there,

I've installed the ROCM runtime on my (Debian testing) machine (upstream kernel 5.3.14-1, dual-socket Haswell Xeon) following the instructions. When I run clinfo, I get a segfault. Adding "HSAKMT_DEBUG_LEVEL=7" sheds a bit more light, but not much:

$ HSAKMT_DEBUG_LEVEL=7  /opt/rocm-3.3.0/opencl/bin/x86_64/clinfoacquiring VM for 13bc using 7SVM alt (coherent):    0x1e00000 - 0x100167ffffSVM (non-coherent): 0x1001680000 - 0x3fffffffffFailed to map remapped mmio page on gpu_mem 0[hsaKmtAllocMemory] node 0[hsaKmtMapMemoryToGPU] address 0x1e08000[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e01000 flags 0x40 size 0x1000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e01000 number of nodes 1[hsaKmtAllocMemory] node 2[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e14000 flags 0x40 size 0x2000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e14000 number of nodes 1[hsaKmtAllocMemory] node 0bind_mem_to_numa mem 0x1e04000 flags 0x1040 size 0x1000 node_id 0[hsaKmtMapMemoryToGPUNodes] address 0x1e04000 number of nodes 1[1]    1726631 segmentation fault (core dumped)  HSAKMT_DEBUG_LEVEL=7 /opt/rocm-3.3.0/opencl/bin/x86_64/clinfo‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The backtrace from gdb is not very informative, other than placing the crash somewhere inside the OpenCL runtime:

Thread 1 "clinfo" received signal SIGSEGV, Segmentation fault.__GI___libc_free (mem=0x437265776f500403) at malloc.c:31023102    malloc.c: No such file or directory.(gdb) bt#0  __GI___libc_free (mem=0x437265776f500403) at malloc.c:3102#1  0x00007ffff7a1ef65 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#2  0x00007ffff7a23737 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#3  0x00007ffff79ec4bf in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#4  0x00007ffff79e7096 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#5  0x00007ffff79b9b15 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#6  0x00007ffff7b37e39 in ?? () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#7  0x00007ffff79b9c4c in clIcdGetPlatformIDsKHR () from /opt/rocm-3.3.0/opencl/lib/x86_64/libamdocl64.so#8  0x00007ffff7e4dae3 in ?? () from /lib/x86_64-linux-gnu/libOpenCL.so.1#9  0x00007ffff7e4e4e3 in clGetPlatformIDs () from /lib/x86_64-linux-gnu/libOpenCL.so.1#10 0x000000000040cdd1 in ?? ()#11 0x0000000000403b8c in ?? ()#12 0x00007ffff7c70e0b in __libc_start_main (main=0x403aa0, argc=1, argv=0x7fffffffe688, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe678) at ../csu/libc-start.c:308#13 0x000000000040c1fe in ?? ()‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

What can I do to troubleshoot this?

AMD Device from lspci:

81:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev cb)                                    81:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Fiji HDMI/DP Audio [Radeon R9 Nano / FURY/FURY X]     ‍‍

Device nodes:

$ ls -lah /dev/kfd /dev/dri /dev/shm                                                                                                                                    andreask_work@stout 0:16crw-rw---- 1 root gpu  242, 0 Dec  2 11:18 /dev/kfd/dev/dri:total 0drwxr-xr-x  3 root root      120 Dec  2 11:18 .drwxr-xr-x 19 root root     3.5K Dec 14 07:00 ..drwxr-xr-x  2 root root      100 Dec  2 11:18 by-pathcrw-rw----  1 root gpu  226,   0 Dec  2 11:18 card0crw-rw----  1 root gpu  226,   1 Dec  2 11:18 card1crw-rw----  1 root gpu  226, 128 Dec  2 11:18 renderD128/dev/shm:total 8.0Kdrwxrwxrwt  2 root root   80 May 26 14:19 .drwxr-xr-x 19 root root 3.5K Dec 14 07:00 ..-rw-rw-r--  1 root gpu     8 May 27 00:13 hsakmt_shared_mem-rw-rw-r--  1 root gpu    32 Dec  7 22:23 sem.hsakmt_semaphore‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Thanks,

Andreas

0 Likes
2 Replies
dipak
Big Boss

Thank you for reporting this.

ROCm related support is provided at the ROCm GitHub site itself. Please post the issue here: Issues · RadeonOpenCompute/ROCm · GitHub 

Thanks.

0 Likes