linuxperia

ROCm OpenCL freezes on Linux for clCreateCommandQueue

Discussion created by linuxperia on Nov 29, 2019
Latest reply on Nov 29, 2019 by dipak

Hi all.

 

 

I have AMD Vega 64 GPU with newest ROCm 2.10 Driver on a Linux headless Server.
See clinfo paste down.

 

My OpenCL Programm freezes always at the clCreateCommandQueue.

 

I have run my Programm in GDB after compiling it with gcc and this is the ouput that show somehow a deadlock BUG in the ROCm OpenCL implementation!

 

Other more simple non multi threading OpenCl Example Programms runs fine on the same machine.
So it must be some kind of a Run Time Bug when using multi Threading in a Programm.

 

P.S. I checked my system with "locate futex-internal.h" and i can not find anywhere this header file which is reported as missing !

 

Please Help me fix this Problem !

Thanks in advance for any Help !

 

 

Thread 1 "myopencl" received signal SIGINT, Interrupt.
0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.

(gdb) bt
#0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#8 0x0000555555559327 in main (argc=<optimized out>, argv=<optimized out>) at myopencl.c:9519

 

(gdb) thread apply all bt

 

Thread 10 (Thread 0x7ffedbfff700 (LWP 10072)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7ffedbfff700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 9 (Thread 0x7ffee0ffe700 (LWP 10071)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7ffee0ffe700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 8 (Thread 0x7ffee17ff700 (LWP 10070)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7ffee17ff700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 7 (Thread 0x7fffecf61700 (LWP 10069)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7fffecf61700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 6 (Thread 0x7fffed762700 (LWP 10068)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7fffed762700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 5 (Thread 0x7fffedf63700 (LWP 10067)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
---Type <return> to continue, or q <return> to quit---
#5 0x00007ffff77306db in start_thread (arg=0x7fffedf63700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

Thread 4 (Thread 0x7fffee764700 (LWP 10066)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7fffee764700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 3 (Thread 0x7fffeef65700 (LWP 10065)):
#0 0x00007ffff77369f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
#3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#4 0x00007ffee6370552 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
#5 0x00007ffff77306db in start_thread (arg=0x7fffeef65700) at pthread_create.c:463
#6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 2 (Thread 0x7fffefa86700 (LWP 10064)):
#0 0x00007ffff6f835d7 in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1 0x00007ffff0882f28 in kmtIoctl () from /opt/rocm/lib/libhsakmt.so.1
#2 0x00007ffff087d36f in hsaKmtWaitOnMultipleEvents () from /opt/rocm/lib/libhsakmt.so.1
#3 0x00007ffff0af2fd3 in core::Signal::WaitAny(unsigned int, hsa_signal_s const*, hsa_signal_condition_t const*, long const*, unsigned long, hsa_wait_state_t, long*) ()
from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#4 0x00007ffff0adbdf6 in AMD::hsa_amd_signal_wait_any(unsigned int, hsa_signal_s*, hsa_signal_condition_t*, long*, unsigned long, hsa_wait_state_t, long*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#5 0x00007ffff0aeb48a in core::Runtime::AsyncEventsLoop(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#6 0x00007ffff0aae797 in os::ThreadTrampoline(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#7 0x00007ffff77306db in start_thread (arg=0x7fffefa86700) at pthread_create.c:463
#8 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

 

Thread 1 (Thread 0x7ffff150ef00 (LWP 10058)):
#0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111
#2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181
#3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
#8 0x0000555555559327 in main (argc=<optimized out>, argv=<optimized out>) at myopencl.c:9519

 

Here is the ouput of clinfo

 

 

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3019.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XT [Radeon RX Vega 64]
Device Topology: PCI[ B#13, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1630Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f450a801d50
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3019.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

Outcomes