Alright so I've been running two RX 580 8GB GPUs for over a year now using Ubuntu 18 and the amdgpu-pro series driver with OpenCL support and had no major problems aside from some compiling issues for the driver/kernel a while back (which was fixed in the updated support for Ubuntu HWE on amdgpu-pro 20.10).
Now, I've upgraded to a Radeon VII (vega 20 firmware) and have run into a slight issue. It may be related to the fact that I didn't uninstall the proprietary driver first before installing the card. The graphics are absolutely wonderful and it works out of the box, but my Radeon VII is not detected for OpenCL/compute jobs. That's the primary reason I bought the card, and I've been unable to find any real answers for this problem using a variety of search engines and search terms. One of the RX 580 GPUs that I left installed is still detected! So I can use my CPU and my old GPU for opencl-enabled programs like hashcat, boinc, blender, etc. but the new Radeon VII isn't detected for these programs at all. Everywhere I look on the Internet where someone hasn't been able to use OpenCL on a Radeon VII is told to install a few firmware files and install the proprietary driver, which is what I've done.
Things I've tried:
Uninstalling the driver, reboot
Added vega20 firmware files to /lib/firmware/amdgpu/ - then update-initramfs -k all -u and rebooted
Reinstalled the driver, rebooted, still no success - graphics work but not showing up in clinfo or hashcat -b to benchmark
Tried finding any setting in the BIOS that would make any difference, no success, running the same settings as always which are defaults with some fan tweaks to increase fan speeds on the case fans
Command I'm using to install the driver: ./amdgpu-pro-install --opencl=pal,legacy
Things I've considered, but *not* tried:
Re-seating the card (shouldn't make a difference, should it?)
Diagnostic output:
uname -r
5.3.0-46-generic
sudo lshw -c video
*-display
description: VGA compatible controller
product: Vega 20
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:0c:00.0
version: c1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:89 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:e000(size=256) memory:fcb00000-fcb7ffff memory:c0000-dffff
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:0d:00.0
version: e7
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:91 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:d000(size=256) memory:fce00000-fce3ffff memory:fce40000-fce5ffff
sudo lspci | grep VGA
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 (rev c1)
0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)
sudo clinfo
Number of platforms 2
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3075.10)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMDPlatform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 1.2 pocl 1.1 None+Asserts, LLVM 6.0.0, SPIR, SLEEF, DISTRO, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix POCLPlatform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name Ellesmere
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (3075.10)
Driver Version 3075.10
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) Radeon RX 580 Series
Device Topology (AMD) PCI-E, 0d:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 36
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1360MHz
Graphics IP (AMD) 8.0
Device Partition (core)
Max number of sub-devices 36
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4432814080 (4.128GiB)
Global free memory (AMD) 4308312 (4.109GiB)
Global memory channels (AMD) 8
Global memory banks per channel (AMD) 16
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 3551587123 (3.308GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 3551587123 (3.308GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1587704465991140729ns (Thu Apr 23 23:01:05 2020)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 2
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 3187338544
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_eventPlatform Name Portable Computing Language
Number of devices 1
Device Name pthread-AMD Ryzen 7 2700X Eight-Core Processor
Device Vendor AuthenticAMD
Device Vendor ID 0x1022
Device Version OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-znver1
Driver Version 1.1
Device OpenCL C Version OpenCL C 1.2 pocl
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 16
Max clock frequency 3700MHz
Device Partition (core)
Max number of sub-devices 16
Supported partition types equally, by counts
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 8 / 8
long 4 / 4
half 0 / 0 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 65265414144 (60.78GiB)
Error Correction support No
Max memory allocation 17179869184 (16GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8388608 (8MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 1073741824 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 32768x32768 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Global
Local memory size 4194304 (4MiB)
Max number of constant args 8
Max constant buffer size 4194304 (4MiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Ellesmere
relevent output from
sudo journalctl | grep amd
kernel: Linux version 5.3.0-46-generic (buildd@lcy01-amd64-013) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 (Ubuntu 5.3.0-46.38~18.04.1-generic 5.3.18)
kernel: amd_uncore: AMD NB counters detected
kernel: amd_uncore: AMD LLC counters detected
kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
kernel: amdkcl: loading out-of-tree module taints kernel.
kernel: amdkcl: loading out-of-tree module taints kernel.
kernel: amdkcl: module verification failed: signature and/or required key missing - tainting kernel
kernel: [drm] amdgpu kernel modesetting enabled.
kernel: [drm] amdgpu version: 5.4.7.20.10
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
kernel: amdgpu 0000:0c:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfcb00000 -> 0xfcb7ffff
kernel: fb0: switching to amdgpudrmfb from VESA VGA
kernel: amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
kernel: amdgpu 0000:0c:00.0: No more image in the PCI ROM
kernel: amdgpu 0000:0c:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
kernel: amdgpu 0000:0c:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
kernel: amdgpu 0000:0c:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
kernel: [drm] amdgpu: 16368M of VRAM memory ready
kernel: [drm] amdgpu: 16368M of GTT memory ready.
kernel: amdgpu: [powerplay] hwmgr_sw_init smu backed is vega20_smu
kernel: amdgpu 0000:0c:00.0: HDCP: hdcp ta ucode is not available
kernel: amdgpu 0000:0c:00.0: DTM: dtm ta ucode is not available
kernel: fbcon: amdgpudrmfb (fb0) is primary device
kernel: amdgpu 0000:0c:00.0: fb0: amdgpudrmfb frame buffer device
kernel: amdgpu 0000:0c:00.0: ring gfx uses VM inv eng 0 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
kernel: amdgpu 0000:0c:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
kernel: amdgpu 0000:0c:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
kernel: amdgpu 0000:0c:00.0: ring sdma0 uses VM inv eng 0 on hub 1
kernel: amdgpu 0000:0c:00.0: ring page0 uses VM inv eng 1 on hub 1
kernel: amdgpu 0000:0c:00.0: ring sdma1 uses VM inv eng 4 on hub 1
kernel: amdgpu 0000:0c:00.0: ring page1 uses VM inv eng 5 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 1
kernel: amdgpu 0000:0c:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce0 uses VM inv eng 12 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce1 uses VM inv eng 13 on hub 1
kernel: amdgpu 0000:0c:00.0: ring vce2 uses VM inv eng 14 on hub 1
kernel: [drm] Initialized amdgpu 3.36.0 20150101 for 0000:0c:00.0 on minor 0
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc0000000 -> 0xcfffffff
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xd0000000 -> 0xd01fffff
kernel: amdgpu 0000:0d:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfce00000 -> 0xfce3ffff
kernel: amdgpu 0000:0d:00.0: enabling device (0000 -> 0003)
kernel: amdgpu 0000:0d:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
kernel: amdgpu 0000:0d:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
kernel: [drm] amdgpu: 8192M of VRAM memory ready
kernel: [drm] amdgpu: 8192M of GTT memory ready.
kernel: amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
kernel: [drm] Initialized amdgpu 3.36.0 20150101 for 0000:0d:00.0 on minor 1
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
sensors[1279]: amdgpu-pci-0c00
sensors[1279]: amdgpu-pci-0d00
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
kernel: EDAC amd64: Node 0: DRAM ECC disabled.
kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
Please help! Lol, Not sure why this is happening.
Really not sure if this 2019 GITHUB thread is of any use in your case or not. But the User needed to install ROCm to get OpenCl to work on his Radeon VII GPU Card: ROCm installation flummoxed -Radeon VII, Ubuntu 18.04 - cancel · Issue #860 · RadeonOpenCompute/ROCm...
You might want to post your question here: Newcomers Start Here so that you can be "Whitelisted" to post at AMD OpenCL Forum: OpenCL
Just posted my request to be whitelisted, and I'll try to find some installation instructions for rocm on Ubuntu 18.04.04HWE
Update, I've removed the amdgpu-pro driver using the amdgpu-pro-uninstall script, rebooted, and then was attempting to install rocm. Unfortunately, it has a version mistmatch with gcc-7, so I'll post about that on the OpenCL forum, pending whitelist.
The following package is a depency of hcc, required by rocm-dev, which is a part of rocm-dkms:
$ sudo apt install gcc-7-multilib
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
gcc-7-multilib : Depends: gcc-7-base (= 7.4.0-1ubuntu1~18.04) but 7.5.0-3ubuntu1~18.04 is to be installed
Depends: gcc-7 (= 7.4.0-1ubuntu1~18.04) but 7.5.0-3ubuntu1~18.04 is to be installed
Depends: lib32gcc-7-dev (= 7.4.0-1ubuntu1~18.04) but it is not going to be installed
Depends: libx32gcc-7-dev (= 7.4.0-1ubuntu1~18.04) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
To be "Whitelisted" you first must state your problem that you are having in detail Newcomers Start Here. Just copy the post you made here and paste it on the Thread you opened and change the title to the same one you have here. Then the Moderators decide which DevGURU Forum it belongs and whether to Whitelist you.
So post the same problem you posted here with all your other replies and see if the Moderators will give you access to AMD OpenCL Forum if they feel it is applicable there or here at this Forum.
The way you posted your question will never get any attention from the Moderators since they don't know the issue you are having.
packed
Updated, thanks!
EDIT: As a side note, installing the headless version of the driver still only allowed me to use the RX 580 and not the Radeon VII for OpenCL. I always completely reboot after installing/uninstalling the driver, and it is currently completely removed - I'm using the open source driver by default for graphics at the moment with no OpenCL/proprietary drivers installed.
Request to get whitelisted -> https://community.amd.com/thread/252105
UPDATE
I've filed a bug report with Ubuntu on their gcc-7 package, since gcc-7-multilib is failing to install on Ubuntu 18.04 HWE. This is a requirement for rocm-dkms under rocm-dev and hcc and does not appear to be the fault of the rocm maintainers at all. Attempting to install gcc-7-multilib results in a version mismatch between gcc 7.4 and 7.5.
So, I'm still having the problem. The fix is (hopefully) installing rocm-dkms which requires this bug to be fixed on Ubuntu's side of things. Otherwise I'll have to wait for an updated amdgpu-pro driver which has OpenCL support for the Radeon VII.
Link to the bug report: https://bugs.launchpad.net/ubuntu/+source/gcc-7/+bug/1875224
UPDATE
The rocm-dkms problem was due to a bad apt mirror, and is now installed. Now I can show both the cards on ROCm's clinfo. However, no programs that actually use OpenCL work now. An example with hashcat:
$ hashcat -b
hashcat (v5.1.0-1426-gb02fe8e0) starting in benchmark mode...Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.OpenCL API (OpenCL 2.1 AMD-APP (3098.0)) - Platform #1 [Advanced Micro Devices, Inc.]
=====================================================================================
* Device #1: gfx906+sram-ecc, 13912/16368 MB allocatable, 60MCU
* Device #2: gfx803, 6963/8192 MB allocatable, 36MCUBenchmark relevant options:
===========================
* --optimized-kernel-enableHashmode: 0 - MD5
clBuildProgram(): CL_BUILD_PROGRAM_FAILURE
Started: Sun Apr 26 17:23:03 2020
Stopped: Sun Apr 26 17:23:05 2020
And with boinc:
$ boinc
26-Apr-2020 17:23:29 [---] cc_config.xml not found - using defaults
26-Apr-2020 17:23:29 [---] Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu
26-Apr-2020 17:23:29 [---] log flags: file_xfer, sched_ops, task
26-Apr-2020 17:23:29 [---] Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
26-Apr-2020 17:23:29 [---] Data directory: /home/user
execv: No such file or directory
26-Apr-2020 17:23:29 [---] GPU detection failed. error code 512
26-Apr-2020 17:23:29 [---] No usable GPUs found...
And ethdcrminer64 - a GPU mining program called Claymore's:
...
AMD Cards available: 2
GPU #0: gfx906+sram-ecc (Vega 20), 16368 MB available, 60 compute units (pci bus 12:0:0)
GPU #0 recognized as Vega
GPU #1: gfx803 (Ellesmere [Radeon RX 470/480/570/570X/580/580X]), 8192 MB available, 36 compute units (pci bus 13:0:0)
POOL/SOLO version
AMD ADL library not found.
Cannot build OpenCL program for GPU 0
Cannot build OpenCL program for GPU 1...
From GITHUB: GitHub - RadeonOpenCompute/ROCm: ROCm - Open Source Platform for HPC and Ultrascale GPU Computing
The ROCm v3.3.x platform is designed to support the following operating systems:
Ubuntu 16.04.6 (Kernel 4.15) and 18.04.4 (Kernel 5.3)
CentOS v7.7 (Using devtoolset-7 runtime support)
RHEL v7.7 (Using devtoolset-7 runtime support)
SLES 15 SP1
Access the following links for more information on:
ROCm documentation, see https://rocm-documentation.readthedocs.io/en/latest/index.html
ROCm Release Notes https://rocm-documentation.readthedocs.io/en/latest/Current_Release_Notes/Current-Release-Notes.html
ROCm QuickStart Installation Guide, see https://rocm-documentation.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html
ROCm binary structure, see https://rocm-documentation.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#rocm-...
Instructions to install PyTorch after ROCm is installed – https://rocm-documentation.readthedocs.io/en/latest/Deep_learning/Deep-learning.html#pytorch
Note: These instructions reference the rocm/pytorch:rocm3.0_ubuntu16.04_py2.7_pytorch image. However, you can substitute the Ubuntu 18.04 image listed at https://hub.docker.com/r/rocm/pytorch/tags
Here are the OpenCL libraries for ROCm: GitHub - RadeonOpenCompute/ROCm-OpenCL-Runtime at roc-3.3.0 and how to install them.
Hopefully your thread will be moved if OpenCL Moderator believe it is applicable. Most likely on Monday.
Just confirming that I'm running Ubuntu 18.04.04 HWE and that my kernel is 5.3.0-46-generic
I got everything in the ROCm-OpenCL-Runtime github repo working up to the make command. It fails about halfway through. First, it was because it couldn't find any OpenCL headers (#include <CL/any-file-name.h wasn't working). I tried to use the latest OpenCL headers referenced in the ocl-icd github repo, but they have compilation errors. I'll try to reference Ubuntu's opencl headers package when I get more time to try and get this working.
Just to update, I still can't get ROCm Runtime to compile. I'm aware that I should post that problem to their github project, and I will when I get time (it's been a busy week). Any work toward getting the Radeon VII OpenCL component working on the amdgpu-pro driver would be tremendously appreciated on my behalf.
Further replies to this thread should probably happen on the OpenCL forum's request, but I'm subscribed to updates for either if a solution is found.
In that case, I suggest you then install AMD GPU PRO driver so that Users in OpenCL can help you with it.
With ROCm, otherwise you will need to post at GITHUB.
...The entire point of this thread has been that Radeon VII is not recognized by the amdgpu-pro driver as an OpenCL device...
...I followed your recommendation to try and get ROCm to work... I'll continue to do that on the ROCm github when I get time...
I can remove ROCm and install amdgpu-pro at any time and that's what I'd prefer to do.
I realize that was your original problem with the Radeon VII and amdgpu-pro driver.
Hopefully ROCm will finally get OpenCL to work on your GPU card.
But if that ends up in a dead end, you can always install amdgpu-pro driver again and ask the Users at OpenCL Forum to see if anyone was able to get the Radeon VII to work in OpenCL.
Anyway, I have nothing else to offer as far as suggestions goes.
Hopefully someone at Github or OpenCL forum will eventually solve your problem if you don't find out the answer yourself.
Did you ever get anywhere with this issue I wonder? I have the same problem on Ubuntu 20.04 using an RX570 and RX5700XT in the same system. Individually they work fine but never both, it favours the 570 if both are connected.
This with the newly released (as of July 2020) driver from here: https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-20-20
For what it's worth the same hardware booting into Windows 10 works fine, i.e. both GPUs are detected and run compute jobs.
Installed with:
$ ./amdgpu-pro-install -y --opencl=pal,legacy --no-32 --headless
$ uname -r
5.4.0-40-generic
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
$ sudo lshw -c video
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:03:00.0
version: ef
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:34 memory:b0000000-bfffffff memory:cfc00000-cfdfffff ioport:c000(size=256) memory:fbcc0000-fbcfffff memory:fbca0000-fbcbffff
*-display
description: VGA compatible controller
product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:07:00.0
version: c1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:35 memory:d0000000-dfffffff memory:cfe00000-cfffffff ioport:e000(size=256) memory:fbf80000-fbffffff memory:fbf60000-fbf7ffff
I ended up opening another thread for this issue in the OpenCL forum, which did not move any closer to a resolution. There were some suggestions made. I have not had time since then to swap out the cards individually on the hardware level, but I suspect that your problem and mine are identical. I don't think there is a fix for this at this time.
Thanks for the update. Yes the issue sounds very similar. Following many threads in this and other forums it seems a common problem:
OpenCL PAL & Legacy platforms under Ubuntu
The latter thread is the most promising. I attempted to make the suggested fix/hack to the equivalent file in the latest version of the drivers (amdgpu-pro-20.20-1098277-ubuntu-20.04) but the path string looks different:
So not sure what I am supposed to change.
I've raised a ticked with AMD support for what it is worth, will see if I get a response.
They have implemented the fix. Update to the latest driver.