powerload

Cannot Get OpenCL on Linux to Work At All

Discussion created by powerload on Feb 16, 2018
Latest reply on Mar 7, 2018 by ben-and-ellen

Hardware: Vega 56

OS: CentOS 7 with the CentOS kernel (couldn't get the kernel driver to build with the mainline 4.9.x LT kernel).

 

I cannot seem to get OpenCL support on the Vega to be recognized at all. Here is a trimmed down terminal transcript of what I did.

# uname -r

3.10.0-693.17.1.el7.x86_64

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]# ./amdgpu-pro-install --opencl=rocm --headless

[amdgpu-pro-local]

Name=AMD amdgpu Pro local repository

baseurl=file:///var/opt/amdgpu-pro-local

enabled=1

gpgcheck=0

 

 

Loaded plugins: fastestmirror

amdgpu-pro-local                                                                                            | 2.9 kB  00:00:00 

[...]

Dependencies Resolved

 

 

====================================================================================================================================

Package                                  Arch                Version                          Repository                    Size

====================================================================================================================================

Installing:

rocm-amdgpu-pro                          x86_64              17.50-511655.el7                amdgpu-pro-local              2.3 k

Installing for dependencies:

amdgpu-core                              noarch              17.50-511655.el7                amdgpu-pro-local              2.2 k

amdgpu-pro-core                          noarch              17.50-511655.el7                amdgpu-pro-local              2.2 k

hsa-ext-amdgpu-pro-finalize              x86_64              1.1.6-511655.el7                amdgpu-pro-local              2.9 M

hsa-ext-amdgpu-pro-image                  x86_64              1.1.6-511655.el7                amdgpu-pro-local              137 k

hsa-runtime-tools-amdgpu-pro              x86_64              1.1.6-511655.el7                amdgpu-pro-local              512 k

rocm-amdgpu-pro-icd                      x86_64              17.50-511655.el7                amdgpu-pro-local              17 M

rocm-amdgpu-pro-opencl                    x86_64              17.50-511655.el7                amdgpu-pro-local              2.0 k

rocr-amdgpu-pro                          x86_64              1.1.6-511655.el7                amdgpu-pro-local              243 k

roct-amdgpu-pro                          x86_64              1.0.7-511655.el7                amdgpu-pro-local              47 k

 

 

Transaction Summary

====================================================================================================================================

Install  1 Package (+9 Dependent packages)

 

 

Total download size: 21 M

Installed size: 21 M

Is this ok [y/d/N]: y

Downloading packages:

------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                              173 MB/s |  21 MB  00:00:00 

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : amdgpu-core-17.50-511655.el7.noarch                                                                            1/10

  Installing : amdgpu-pro-core-17.50-511655.el7.noarch                                                                        2/10

  Installing : roct-amdgpu-pro-1.0.7-511655.el7.x86_64                                                                        3/10

  Installing : rocr-amdgpu-pro-1.1.6-511655.el7.x86_64                                                                        4/10

  Installing : rocm-amdgpu-pro-opencl-17.50-511655.el7.x86_64                                                                  5/10

  Installing : rocm-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                    6/10

  Installing : hsa-ext-amdgpu-pro-finalize-1.1.6-511655.el7.x86_64                                                            7/10

  Installing : hsa-ext-amdgpu-pro-image-1.1.6-511655.el7.x86_64                                                                8/10

  Installing : hsa-runtime-tools-amdgpu-pro-1.1.6-511655.el7.x86_64                                                            9/10

  Installing : rocm-amdgpu-pro-17.50-511655.el7.x86_64                                                                        10/10

  Verifying  : hsa-ext-amdgpu-pro-finalize-1.1.6-511655.el7.x86_64                                                            1/10

  Verifying  : rocr-amdgpu-pro-1.1.6-511655.el7.x86_64                                                                        2/10

  Verifying  : rocm-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                    3/10

  Verifying  : rocm-amdgpu-pro-17.50-511655.el7.x86_64                                                                        4/10

  Verifying  : amdgpu-pro-core-17.50-511655.el7.noarch                                                                        5/10

  Verifying  : rocm-amdgpu-pro-opencl-17.50-511655.el7.x86_64                                                                  6/10

  Verifying  : roct-amdgpu-pro-1.0.7-511655.el7.x86_64                                                                        7/10

  Verifying  : hsa-ext-amdgpu-pro-image-1.1.6-511655.el7.x86_64                                                                8/10

  Verifying  : hsa-runtime-tools-amdgpu-pro-1.1.6-511655.el7.x86_64                                                            9/10

  Verifying  : amdgpu-core-17.50-511655.el7.noarch                                                                            10/10

 

 

Installed:

  rocm-amdgpu-pro.x86_64 0:17.50-511655.el7                                                                                     

 

 

Dependency Installed:

  amdgpu-core.noarch 0:17.50-511655.el7                              amdgpu-pro-core.noarch 0:17.50-511655.el7                 

  hsa-ext-amdgpu-pro-finalize.x86_64 0:1.1.6-511655.el7              hsa-ext-amdgpu-pro-image.x86_64 0:1.1.6-511655.el7         

  hsa-runtime-tools-amdgpu-pro.x86_64 0:1.1.6-511655.el7            rocm-amdgpu-pro-icd.x86_64 0:17.50-511655.el7             

  rocm-amdgpu-pro-opencl.x86_64 0:17.50-511655.el7                  rocr-amdgpu-pro.x86_64 0:1.1.6-511655.el7                 

  roct-amdgpu-pro.x86_64 0:1.0.7-511655.el7                     

 

 

Complete!

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]#

 

This doesn't seem to install the kernel driver at all, so:

 

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]# yum install amdgpu-dkms

[...]

====================================================================================================================================

Package                      Arch                    Version                            Repository                          Size

====================================================================================================================================

Installing:

amdgpu-dkms                  noarch                  17.50-511655.el7                  amdgpu-pro-local                  7.1 M

 

 

Transaction Summary

====================================================================================================================================

Install  1 Package

 

 

Total download size: 7.1 M

Installed size: 7.1 M

Is this ok [y/d/N]: y

Downloading packages:

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : amdgpu-dkms-17.50-511655.el7.noarch                                                                              1/1

Loading new amdgpu-17.50-511655.el7 DKMS files...

dpkg: warning: version '3.10.0-693.17.1.el7.x86_64' has bad syntax: invalid character in revision number

dpkg: warning: version '3.10.0-693.17.1.el7.x86_64' has bad syntax: invalid character in revision number

dpkg: warning: version '4.9.81-1.el7.centos.x86_64' has bad syntax: invalid character in revision number

dpkg: warning: version '3.10.0-693.17.1.el7.x86_64' has bad syntax: invalid character in revision number

Building for 3.10.0-693.17.1.el7.x86_64 4.9.81-1.el7.centos.x86_64

Building initial module for 3.10.0-693.17.1.el7.x86_64

Done.

Forcing installation of amdgpu

 

 

amdgpu:

Running module version sanity check.

- Original module

  - No original module exists within this kernel

- Installation

  - Installing to /lib/modules/3.10.0-693.17.1.el7.x86_64/extra/

 

 

amdttm.ko:

Running module version sanity check.

- Original module

  - No original module exists within this kernel

- Installation

  - Installing to /lib/modules/3.10.0-693.17.1.el7.x86_64/extra/

 

 

amdkcl.ko:

Running module version sanity check.

- Original module

  - No original module exists within this kernel

- Installation

  - Installing to /lib/modules/3.10.0-693.17.1.el7.x86_64/extra/

 

 

amdkfd.ko:

Running module version sanity check.

- Original module

  - No original module exists within this kernel

- Installation

  - Installing to /lib/modules/3.10.0-693.17.1.el7.x86_64/extra/

Adding any weak-modules

 

 

depmod....

 

 

Backing up initramfs-3.10.0-693.17.1.el7.x86_64.img to /boot/initramfs-3.10.0-693.17.1.el7.x86_64.img.old-dkms

Making new initramfs-3.10.0-693.17.1.el7.x86_64.img

(If next boot fails, revert to initramfs-3.10.0-693.17.1.el7.x86_64.img.old-dkms image)

dracut.......

 

 

DKMS: install completed.

Building initial module for 4.9.81-1.el7.centos.x86_64

Error! Bad return status for module build on kernel: 4.9.81-1.el7.centos.x86_64 (x86_64)

Consult /var/lib/dkms/amdgpu/17.50-511655.el7/build/make.log for more information.

warning: %post(amdgpu-dkms-0:17.50-511655.el7.noarch) scriptlet failed, exit status 10

Non-fatal POSTIN scriptlet failure in rpm package amdgpu-dkms-17.50-511655.el7.noarch

  Verifying  : amdgpu-dkms-17.50-511655.el7.noarch                                                                              1/1

 

 

Installed:

  amdgpu-dkms.noarch 0:17.50-511655.el7                                                                                         

 

 

Complete!

 

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]# lsmod | grep amdgpu

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]# modprobe amdgpu

[root@grumpy ~/ati/amdgpu-pro-17.50-511655]# lsmod | grep amdgpu

amdgpu              3143876  2

amdttm                110970  1 amdgpu

amdkcl                24897  3 amdgpu,amdkfd,amdttm

i2c_algo_bit          13413  2 i915,amdgpu

drm_kms_helper        159169  3 i915,amdgpu,nvidia_drm

drm                  370825  15 i915,drm_kms_helper,amdgpu,amdkcl,amdttm,nvidia_drm

i2c_core              40756  8 drm,i915,i2c_i801,i2c_hid,drm_kms_helper,i2c_algo_bit,amdgpu,nvidia

 

At this point there is no clinfo command available:

 

[root@grumpy ~]# yum install clinfo

[...]

====================================================================================================================================

Package                    Arch                        Version                                    Repository                Size

====================================================================================================================================

Installing:

clinfo                      x86_64                      2.1.17.02.09-1.el7                        epel                      39 k

 

 

Transaction Summary

====================================================================================================================================

Install  1 Package

 

 

Total download size: 39 k

Installed size: 83 k

Is this ok [y/d/N]: y

Downloading packages:

clinfo-2.1.17.02.09-1.el7.x86_64.rpm                                                                        |  39 kB  00:00:00 

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : clinfo-2.1.17.02.09-1.el7.x86_64                                                                                1/1

  Verifying  : clinfo-2.1.17.02.09-1.el7.x86_64                                                                                1/1

 

 

Installed:

  clinfo.x86_64 0:2.1.17.02.09-1.el7                                                                                             

 

 

Complete!

[root@grumpy ~]# clinfo

Number of platforms                              0

 

[root@grumpy ~]# LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64 /usr/local/bin/ethminer --list-devices

  ✘  11:41:11|ethminer  No OpenCL platforms found

 

 

Listing CUDA devices.

FORMAT: [deviceID] deviceName

[0] GeForce GTX 1070 Ti

Compute version: 6.1

cudaDeviceProp::totalGlobalMem: 8508145664

[1] GeForce GTX 980 Ti

Compute version: 5.2

cudaDeviceProp::totalGlobalMem: 6373572608

[2] GeForce GTX 1070 Ti

Compute version: 6.1

cudaDeviceProp::totalGlobalMem: 8508145664

[3] GeForce GTX 1070

Compute version: 6.1

cudaDeviceProp::totalGlobalMem: 8508145664

[4] GeForce GTX 1070 Ti

Compute version: 6.1

cudaDeviceProp::totalGlobalMem: 8508145664

[5] GeForce GTX 1070 Ti

Compute version: 6.1

cudaDeviceProp::totalGlobalMem: 8508145664

 

[root@grumpy ~]# yum install clinfo-amdgpu-pro-17.50-511655.el7.x86_64

[...]

====================================================================================================================================

Package                                Arch                Version                          Repository                      Size

====================================================================================================================================

Installing:

clinfo-amdgpu-pro                      x86_64              17.50-511655.el7                  amdgpu-pro-local              198 k

Installing for dependencies:

libopencl-amdgpu-pro                  x86_64              17.50-511655.el7                  amdgpu-pro-local                11 k

libopencl-amdgpu-pro-icd              x86_64              17.50-511655.el7                  amdgpu-pro-local                29 M

 

 

Transaction Summary

====================================================================================================================================

Install  1 Package (+2 Dependent packages)

 

 

Total download size: 29 M

Installed size: 29 M

Is this ok [y/d/N]: y

Downloading packages:

------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                              345 MB/s |  29 MB  00:00:00 

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : libopencl-amdgpu-pro-17.50-511655.el7.x86_64                                                                    1/3

  Installing : libopencl-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                2/3

  Installing : clinfo-amdgpu-pro-17.50-511655.el7.x86_64                                                                        3/3

  Verifying  : libopencl-amdgpu-pro-17.50-511655.el7.x86_64                                                                    1/3

  Verifying  : libopencl-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                2/3

  Verifying  : clinfo-amdgpu-pro-17.50-511655.el7.x86_64                                                                        3/3

 

 

Installed:

  clinfo-amdgpu-pro.x86_64 0:17.50-511655.el7                                                                                   

 

 

Dependency Installed:

  libopencl-amdgpu-pro.x86_64 0:17.50-511655.el7                libopencl-amdgpu-pro-icd.x86_64 0:17.50-511655.el7             

 

 

Complete!

 

Once that is install everything just outright segfaults.

 

[root@grumpy ~]# clinfo

Segmentation fault

 

[root@grumpy ~]# /opt/amdgpu-pro/bin/clinfo

Segmentation fault

 

[root@grumpy ~]# LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64 /usr/local/bin/ethminer --list-devices

Segmentation fault

 

If I remove all the Nvidia OpenCL libraries and re-run ldconfig, everything still segfaults.

 

If I remove the amdgpu-pro opencl libraries, there is no libOpenCL.so so nothing finds it (I removed the Nvidia one earlier):

 

[root@grumpy ~]# yum remove clinfo-amdgpu-pro-17.50-511655.el7.x86_64 libopencl-amdgpu-pro-icd-17.50-511655.el7.x86_64 libopencl-amdgpu-pro-17.50-511655.el7.x86_64

 

[...]

====================================================================================================================================

Package                                Arch                 Version                          Repository                       Size

====================================================================================================================================

Removing:

clinfo-amdgpu-pro                      x86_64               17.50-511655.el7                 @amdgpu-pro-local               780 k

libopencl-amdgpu-pro                   x86_64               17.50-511655.el7                 @amdgpu-pro-local                27 k

libopencl-amdgpu-pro-icd               x86_64               17.50-511655.el7                 @amdgpu-pro-local               102 M

 

 

Transaction Summary

====================================================================================================================================

Remove  3 Packages

 

 

Installed size: 103 M

Is this ok [y/N]: y

Downloading packages:

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Erasing    : clinfo-amdgpu-pro-17.50-511655.el7.x86_64                                                                        1/3

  Erasing    : libopencl-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                 2/3

  Erasing    : libopencl-amdgpu-pro-17.50-511655.el7.x86_64                                                                     3/3

  Verifying  : libopencl-amdgpu-pro-17.50-511655.el7.x86_64                                                                     1/3

  Verifying  : libopencl-amdgpu-pro-icd-17.50-511655.el7.x86_64                                                                 2/3

  Verifying  : clinfo-amdgpu-pro-17.50-511655.el7.x86_64                                                                        3/3

 

 

Removed:

  clinfo-amdgpu-pro.x86_64 0:17.50-511655.el7                        libopencl-amdgpu-pro.x86_64 0:17.50-511655.el7              

  libopencl-amdgpu-pro-icd.x86_64 0:17.50-511655.el7              

 

 

Complete!

[root@grumpy ~]# clinfo

clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory

 

 

So which is the correct libOpenCL to use, and what package does it come from? The only one that ships with the driver packages results in nothing but segfaults.

At this point I'm reasonably sure this has nothing to do with interference from Nvidia drivers and libraries.

 

Note: I deleted the amdgpu kernel driver that ships with the kernel so that it wouldn't clash with the one built using DKMS from the driver bundle.

Outcomes