Trouble with getting amdgpu drivers working with Vega on Ubuntu 16.04.3

Question asked by wdormann on Nov 1, 2017
Latest reply on Mar 16, 2018 by wdormann



I've been having a difficult time getting the drivers working (so that I can use OpenCL) on my Ubuntu 16.04.3 system with a Vega RX 64.  I'm using the 17.40 drivers.  For starters:

The drivers do not work with a clean Ubuntu 16.04.3 install.  The reason:  Ubuntu 16.04.3 does not install with the Hardware Enablement Stack enabled by default.  If you want any hope of getting the drivers (even to just run X) working, you must run:

sudo apt install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04

If you don't do this, you'll get warnings about ABI mismatches.  And if you disable ABI version checking in your xorg.conf file, you'll just end up crashing X.  This should be very clearly documented in the install guide.  But better yet, the installer for amdgpu-pro should enforce that the system has HWE.  Otherwise, users are going to be pulling their hair out!


Now, after I did the above I was able to get X working, but clinfo simply crashes:


$ env LLVM_BIN=/opt/amdgpu-pro/bin /opt/amdgpu-pro/bin/clinfo

terminate called after throwing an instance of 'cl::Error'

  what():  clGetPlatformIDs

Aborted (core dumped)


I've rebooted and subsequently installed rocm as the install guide recommends:

sudo apt install -y rocm-amdgpu-pro


I'm not sure that it's any help, but here's gdb info for the crash:


$ gdb $LLVM_BIN/clinfo

Reading symbols from /opt/amdgpu-pro/bin/clinfo...(no debugging symbols found)...done.

(gdb) r

Starting program: /opt/amdgpu-pro/bin/clinfo

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/".

terminate called after throwing an instance of 'cl::Error'

  what():  clGetPlatformIDs


Program received signal SIGABRT, Aborted.

0x00007ffff6cf3428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt

#0  0x00007ffff6cf3428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

#1  0x00007ffff6cf502a in __GI_abort () at abort.c:89

#2  0x000000000045b405 in ?? ()

#3  0x000000000045a1f6 in ?? ()

#4  0x000000000045a223 in ?? ()

#5  0x000000000045a32e in ?? ()

#6  0x0000000000407b5d in ?? ()

#7  0x000000000040f699 in ?? ()

#8  0x0000000000407c12 in ?? ()

#9  0x00007ffff6cde830 in __libc_start_main (main=0x407b60, argc=1, argv=0x7fffffffe598, init=<optimized out>,

    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe588) at ../csu/libc-start.c:291

#10 0x000000000040e741 in ?? ()


If I run the ubunu-provided clinfo application, I get:


$ clinfo

Number of platforms                               0


Device info:


04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 687f (rev c1) (prog-if 00 [VGA controller])

        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 6b76

        Flags: bus master, fast devsel, latency 0, IRQ 28

        Memory at c0000000 (64-bit, prefetchable) [size=256M]

        Memory at d0000000 (64-bit, prefetchable) [size=2M]

        I/O ports at dc00 [size=256]

        Memory at fcb80000 (32-bit, non-prefetchable) [size=512K]

        Expansion ROM at 000c0000 [disabled] [size=128K]

        Capabilities: [48] Vendor Specific Information: Len=08 <?>

        Capabilities: [50] Power Management version 3

        Capabilities: [64] Express Legacy Endpoint, MSI 00

        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+

        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

        Capabilities: [150] Advanced Error Reporting

        Capabilities: [200] #15

        Capabilities: [270] #19

        Capabilities: [2a0] Access Control Services

        Capabilities: [2b0] Address Translation Service (ATS)

        Capabilities: [2c0] #13

        Capabilities: [2d0] #1b

        Capabilities: [320] Latency Tolerance Reporting

        Kernel driver in use: amdgpu

        Kernel modules: amdgpu


strace of clinfo execution is attached.

Where do I go from here?