AnsweredAssumed Answered

Radeon VII NOT recognized in clinfo OpenCL, cannot run compute jobs, but RX 580 is - Linux Ubuntu amdgpu-pro driver

Question asked by makeitwork on Apr 25, 2020
Latest reply on May 1, 2020 by elstaci

Alright so I've been running two RX 580 8GB GPUs for over a year now using Ubuntu 18 and the amdgpu-pro series driver with OpenCL support and had no major problems aside from some compiling issues for the driver/kernel a while back (which was fixed in the updated support for Ubuntu HWE on amdgpu-pro 20.10).

 

Now, I've upgraded to a Radeon VII (vega 20 firmware) and have run into a slight issue.  It may be related to the fact that I didn't uninstall the proprietary driver first before installing the card.  The graphics are absolutely wonderful and it works out of the box, but my Radeon VII is not detected for OpenCL/compute jobs.  That's the primary reason I bought the card, and I've been unable to find any real answers for this problem using a variety of search engines and search terms.  One of the RX 580 GPUs that I left installed is still detected!  So I can use my CPU and my old GPU for opencl-enabled programs like hashcat, boinc, blender, etc. but the new Radeon VII isn't detected for these programs at all.  Everywhere I look on the Internet where someone hasn't been able to use OpenCL on a Radeon VII is told to install a few firmware files and install the proprietary driver, which is what I've done.

 

Things I've tried:

Uninstalling the driver, reboot

Added vega20 firmware files to /lib/firmware/amdgpu/ - then update-initramfs -k all -u and rebooted

Reinstalled the driver, rebooted, still no success - graphics work but not showing up in clinfo or hashcat -b to benchmark

Tried finding any setting in the BIOS that would make any difference, no success, running the same settings as always which are defaults with some fan tweaks to increase fan speeds on the case fans

Command I'm using to install the driver:  ./amdgpu-pro-install --opencl=pal,legacy

 

Things I've considered, but *not* tried:
Re-seating the card (shouldn't make a difference, should it?)

 

Diagnostic output:

 

uname -r

5.3.0-46-generic

sudo lshw -c video

  *-display                 
       description: VGA compatible controller
       product: Vega 20
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0c:00.0
       version: c1
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:89 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:e000(size=256) memory:fcb00000-fcb7ffff memory:c0000-dffff
  *-display
       description: VGA compatible controller
       product: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:0d:00.0
       version: e7
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:91 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:d000(size=256) memory:fce00000-fce3ffff memory:fce40000-fce5ffff

sudo lspci | grep VGA

0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 (rev c1)
0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)

sudo clinfo

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (3075.10)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

 

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.1 None+Asserts, LLVM 6.0.0, SPIR, SLEEF, DISTRO, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

 

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     Ellesmere
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (3075.10)
  Driver Version                                  3075.10
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon RX 580 Series
  Device Topology (AMD)                           PCI-E, 0d:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               36
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1360MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     36
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4432814080 (4.128GiB)
  Global free memory (AMD)                        4308312 (4.109GiB)
  Global memory channels (AMD)                    8
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3551587123 (3.308GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3551587123 (3.308GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1587704465991140729ns (Thu Apr 23 23:01:05 2020)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             3187338544
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

 

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-AMD Ryzen 7 2700X Eight-Core Processor
  Device Vendor                                   AuthenticAMD
  Device Vendor ID                                0x1022
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-znver1
  Driver Version                                  1.1
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               16
  Max clock frequency                             3700MHz
  Device Partition                                (core)
    Max number of sub-devices                     16
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                               16 / 16      
    int                                                  8 / 8       
    long                                                 4 / 4       
    half                                                 0 / 0        (n/a)
    float                                                8 / 8       
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              65265414144 (60.78GiB)
  Error Correction support                        No
  Max memory allocation                           17179869184 (16GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8388608 (8MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            1073741824 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               4194304 (4MiB)
  Max number of constant args                     8
  Max constant buffer size                        4194304 (4MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64

 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Ellesmere
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Ellesmere
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Ellesmere

relevent output from
sudo journalctl | grep amd

kernel: Linux version 5.3.0-46-generic (buildd@lcy01-amd64-013) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 (Ubuntu 5.3.0-46.38~18.04.1-generic 5.3.18)
kernel: amd_uncore: AMD NB counters detected
kernel: amd_uncore: AMD LLC counters detected
kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
kernel: amdkcl: loading out-of-tree module taints kernel.
kernel: amdkcl: loading out-of-tree module taints kernel.
kernel: amdkcl: module verification failed: signature and/or required key missing - tainting kernel
kernel: [drm] amdgpu kernel modesetting enabled.
kernel: [drm] amdgpu version: 5.4.7.20.10
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfcb00000 -> 0xfcb7ffff
kernel: fb0: switching to amdgpudrmfb from VESA VGA
kernel: amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
kernel: amdgpu 0000:0c:00.0: No more image in the PCI ROM
kernel: amdgpu 0000:0c:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
kernel: amdgpu 0000:0c:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
kernel: amdgpu 0000:0c:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
kernel: [drm] amdgpu: 16368M of VRAM memory ready
kernel: [drm] amdgpu: 16368M of GTT memory ready.
kernel: amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
kernel: amdgpu 0000:0c:00.0: HDCP: hdcp ta ucode is not available
kernel: amdgpu 0000:0c:00.0: DTM: dtm ta ucode is not available
kernel: fbcon: amdgpudrmfb (fb0) is primary device
kernel: amdgpu 0000:0c:00.0: fb0: amdgpudrmfb frame buffer device
kernel: amdgpu 0000:0c:00.0: ring gfx uses VM inv eng 0 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
kernel: amdgpu 0000:0c:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
kernel: amdgpu 0000:0c:00.0: ring sdma0 uses VM inv eng 0 on hub 1
kernel: amdgpu 0000:0c:00.0: ring page0 uses VM inv eng 1 on hub 1
kernel: amdgpu 0000:0c:00.0: ring sdma1 uses VM inv eng 4 on hub 1
kernel: amdgpu 0000:0c:00.0: ring page1 uses VM inv eng 5 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce0 uses VM inv eng 12 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce1 uses VM inv eng 13 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce2 uses VM inv eng 14 on hub 1
kernel: [drm] Initialized amdgpu 3.36.0 20150101 for 0000:0c:00.0 on minor 0
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc0000000 -> 0xcfffffff
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xd0000000 -> 0xd01fffff
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfce00000 -> 0xfce3ffff
kernel: amdgpu 0000:0d:00.0: enabling device (0000 -> 0003)
kernel: amdgpu 0000:0d:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
kernel: amdgpu 0000:0d:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
kernel: [drm] amdgpu: 8192M of VRAM memory ready
kernel: [drm] amdgpu: 8192M of GTT memory ready.
kernel: amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
kernel: [drm] Initialized amdgpu 3.36.0 20150101 for 0000:0d:00.0 on minor 1
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
sensors[1279]: amdgpu-pci-0c00
sensors[1279]: amdgpu-pci-0d00
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.

Please help!  Lol, Not sure why this is happening.

Outcomes