Hi,
I am using Centos 7. I was using the driver from amdgpu, but don't remember if pro or not. I could run OpenCL... until the computation is long enough (1-2s), that it freezes... and sometimes the whole machine is freezed, with only solution a hard reset. Awful.
So, I wanted to give a try with new drivers, just in case... and now it is even worst, I cannot list with clinfo nor execute the simplest kernel.
What is going on?
I am trying to update just because the GPU freezes eventually when I run kernels.. and now it cannot even start.
$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (3224.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx1010
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0
Driver Version 3224.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Device Board Name (AMD) Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
Device Topology (AMD) PCI-E, 85:00.0
Max compute units 20
SIMD per compute unit (AMD) 4
SIMD width (AMD) 32
SIMD instruction width (AMD) 1
Max clock frequency 2100MHz
Graphics IP (AMD) 10.1
Device Partition (core)
Max number of sub-devices 20
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Compiler Available Yes
Linker Available Yes
^C
Program received signal SIGINT, Interrupt.
0x00007ffff05a22f1 in rocr::core::InterruptSignal::WaitRelaxed(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
(gdb) bt
#0 0x00007ffff05a22f1 in rocr::core::InterruptSignal::WaitRelaxed(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#1 0x00007ffff05a21aa in rocr::core::InterruptSignal::WaitAcquire(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#2 0x00007ffff0599cf9 in rocr::HSA::hsa_signal_wait_scacquire(hsa_signal_s, hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#3 0x00007ffff057c500 in rocr::AMD::BlitKernel::SubmitLinearCopyCommand(void*, void const*, unsigned long) ()
from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#4 0x00007ffff059048a in rocr::(anonymous namespace)::RegionMemory::Freeze() ()
from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#5 0x00007ffff05b5484 in rocr::amd::hsa::loader::Segment::Freeze() [clone .part.0] ()
from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#6 0x00007ffff05b54f6 in rocr::amd::hsa::loader::ExecutableImpl::Freeze(char const*) ()
from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#7 0x00007ffff05b4c56 in rocr::amd::hsa::loader::AmdHsaCodeLoader::FreezeExecutable(rocr::amd::hsa::loader::Executable*, char const*) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#8 0x00007ffff059c528 in rocr::HSA::hsa_executable_freeze(hsa_executable_s, char const*) ()
from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#9 0x00007ffff7f6edb8 in roc::LightningProgram::setKernels(amd::option::Options*, void*, unsigned long, int, unsigned long, std::string) () from /opt/amdgpu-pro/lib64/libamdocl64.so
#10 0x00007ffff7f660ac in device::Program::linkImplLC(amd::option::Options*) ()
from /opt/amdgpu-pro/lib64/libamdocl64.so
#11 0x00007ffff7f66b11 in device::Program::build(std::string const&, char const*, amd::option::Options*) ()
from /opt/amdgpu-pro/lib64/libamdocl64.so
#12 0x00007ffff7f25759 in amd::Program::build(std::vector<amd::Device*, std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool, bool) () from /opt/amdgpu-pro/lib64/libamdocl64.so
#13 0x00007ffff7efbf34 in clBuildProgram () from /opt/amdgpu-pro/lib64/libamdocl64.so
#14 0x0000000000405a76 in getWGsizes ()
#15 0x0000000000405f6c in device_info_wg ()
#16 0x0000000000406f9e in printDeviceInfo ()
#17 0x0000000000407ba2 in showDevices ()
#18 0x00000000004014dc in main ()
Freezed in this line.
Currently with the version 20.50 (freezing at clinfo or the simplest driver), previously with 20.20 or 20.40 (don't remember), but it freezes with executions (eg. matrix multiplication of 1024x1024 can finish, 2048x2048 cannot finish).
I tried different instalations:
- ./amdgpu-install -y --opencl=legacy,rocr --headles
- ./amdgpu-pro-install -y --opencl=legacy,rocr --headless
- ./amdgpu-pro-install -y --opencl=legacy --headless
- ./amdgpu-pro-install -y --opencl=rocr --headless
example of packages installed with pro and rocr:
amdgpu-core noarch 20.50-1234664.el7 @amdgpu-pro-local 0.0
amdgpu-dkms noarch 1:5.9.10.69-1234664.el7 @amdgpu-pro-local 224 M
amdgpu-dkms-firmware noarch 1:5.9.10.69-1234664.el7 @amdgpu-pro-local 43 M
amdgpu-pro-versionlist noarch 20.50-1234664.el7 @amdgpu-pro-local 1.8 k
amdgpu-versionlist noarch 20.50-1234664.el7 @amdgpu-pro-local 2.0 k
amdgpu-pro-core noarch 20.50-1234664.el7 @amdgpu-pro-local 9.3 k
amdgpu-pro-rocr-opencl x86_64 20.50-1234664.el7 @amdgpu-pro-local 9.3 k
clinfo-amdgpu-pro x86_64 20.50-1234664.el7 @amdgpu-pro-local 179 k
comgr-amdgpu-pro x86_64 1.9.0-1234664.el7 @amdgpu-pro-local 126 M
hip-rocr-amdgpu-pro x86_64 20.50-1234664.el7 @amdgpu-pro-local 2.3 M
hsa-runtime-rocr-amdgpu x86_64 1.2.0-1234664.el7 @amdgpu-pro-local 2.4 M
hsakmt-roct-amdgpu x86_64 1.0.9-1234664.el7 @amdgpu-pro-local 624 k
libdrm-amdgpu x86_64 1:2.4.100-1234664.el7 @amdgpu-pro-local 230 k
libdrm-amdgpu-common noarch 1.0.0-1234664.el7 @amdgpu-pro-local 8.5 k
ocl-icd-amdgpu-pro x86_64 20.50-1234664.el7 @amdgpu-pro-local 53 k
opencl-rocr-amdgpu-pro x86_64 20.50-1234664.el7 @amdgpu-pro-local 1.7 M
Example of installation in CentOS 7.9
$ ~/a/amdgpu-pro-20.50-1234664-rhel-7.9 ❯❯❯ sudo ./amdgpu-pro-install --headless --opencl=legacy,rocr
[amdgpu-pro-local]
name=AMD amdgpu Pro local repository
baseurl=file:///var/opt/amdgpu-pro-local
enabled=1
gpgcheck=0
...
Total 151 MB/s | 85 MB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : amdgpu-core-20.50-1234664.el7.noarch 1/15
Installing : amdgpu-pro-core-20.50-1234664.el7.noarch 2/15
Installing : comgr-amdgpu-pro-1.9.0-1234664.el7.x86_64 3/15
Installing : ocl-icd-amdgpu-pro-20.50-1234664.el7.x86_64 4/15
Installing : clinfo-amdgpu-pro-20.50-1234664.el7.x86_64 5/15
Installing : libdrm-amdgpu-common-1.0.0-1234664.el7.noarch 6/15
Installing : 1:libdrm-amdgpu-2.4.100-1234664.el7.x86_64 7/15
Installing : hsakmt-roct-amdgpu-1.0.9-1234664.el7.x86_64 8/15
Installing : hsa-runtime-rocr-amdgpu-1.2.0-1234664.el7.x86_64 9/15
Installing : hip-rocr-amdgpu-pro-20.50-1234664.el7.x86_64 10/15
Installing : opencl-rocr-amdgpu-pro-20.50-1234664.el7.x86_64 11/15
Installing : 1:amdgpu-dkms-firmware-5.9.10.69-1234664.el7.noarch 12/15
Installing : 1:amdgpu-dkms-5.9.10.69-1234664.el7.noarch 13/15
Loading new amdgpu-5.9.10.69-1234664.el7 DKMS files...
Building for 3.10.0-1160.21.1.el7.x86_64
Building initial module for 3.10.0-1160.21.1.el7.x86_64
Done.
Forcing installation of amdgpu
amdgpu.ko.xz:
Running module version sanity check.
- Original module
- Found /lib/modules/3.10.0-1160.21.1.el7.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz
- Storing in /var/lib/dkms/amdgpu/original_module/3.10.0-1160.21.1.el7.x86_64/x86_64/
- Archiving for uninstallation purposes
- Installation
- Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/
amdttm.ko.xz:
Running module version sanity check.
- Original module
- Installation
- Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/
amdkcl.ko.xz:
Running module version sanity check.
- Original module
- Installation
- Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/
amd-sched.ko.xz:
Running module version sanity check.
- Original module
- Installation
- Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/
Adding any weak-modules
Running the post_install script:
depmod....
DKMS: install completed.
Installing : amdgpu-pro-rocr-opencl-20.50-1234664.el7.x86_64 14/15
Installing : opencl-orca-amdgpu-pro-icd-20.50-1234664.el7.x86_64 15/15
Verifying : comgr-amdgpu-pro-1.9.0-1234664.el7.x86_64 1/15
Verifying : 1:amdgpu-dkms-5.9.10.69-1234664.el7.noarch 2/15
Verifying : 1:libdrm-amdgpu-2.4.100-1234664.el7.x86_64 3/15
Verifying : amdgpu-pro-rocr-opencl-20.50-1234664.el7.x86_64 4/15
Verifying : amdgpu-core-20.50-1234664.el7.noarch 5/15
Verifying : libdrm-amdgpu-common-1.0.0-1234664.el7.noarch 6/15
Verifying : hip-rocr-amdgpu-pro-20.50-1234664.el7.x86_64 7/15
Verifying : opencl-orca-amdgpu-pro-icd-20.50-1234664.el7.x86_64 8/15
Verifying : ocl-icd-amdgpu-pro-20.50-1234664.el7.x86_64 9/15
Verifying : clinfo-amdgpu-pro-20.50-1234664.el7.x86_64 10/15
Verifying : 1:amdgpu-dkms-firmware-5.9.10.69-1234664.el7.noarch 11/15
Verifying : hsa-runtime-rocr-amdgpu-1.2.0-1234664.el7.x86_64 12/15
Verifying : amdgpu-pro-core-20.50-1234664.el7.noarch 13/15
Verifying : opencl-rocr-amdgpu-pro-20.50-1234664.el7.x86_64 14/15
Verifying : hsakmt-roct-amdgpu-1.0.9-1234664.el7.x86_64 15/15
Installed:
amdgpu-dkms.noarch 1:5.9.10.69-1234664.el7 amdgpu-pro-rocr-opencl.x86_64 0:20.50-1234664.el7
clinfo-amdgpu-pro.x86_64 0:20.50-1234664.el7 opencl-orca-amdgpu-pro-icd.x86_64 0:20.50-1234664.el7
Dependency Installed:
amdgpu-core.noarch 0:20.50-1234664.el7 amdgpu-dkms-firmware.noarch 1:5.9.10.69-1234664.el7
amdgpu-pro-core.noarch 0:20.50-1234664.el7 comgr-amdgpu-pro.x86_64 0:1.9.0-1234664.el7
hip-rocr-amdgpu-pro.x86_64 0:20.50-1234664.el7 hsa-runtime-rocr-amdgpu.x86_64 0:1.2.0-1234664.el7
hsakmt-roct-amdgpu.x86_64 0:1.0.9-1234664.el7 libdrm-amdgpu.x86_64 1:2.4.100-1234664.el7
libdrm-amdgpu-common.noarch 0:1.0.0-1234664.el7 ocl-icd-amdgpu-pro.x86_64 0:20.50-1234664.el7
opencl-rocr-amdgpu-pro.x86_64 0:20.50-1234664.el7
Complete!
Hi @userxx
Thank you for reporting it. It seems like the post was wrongly marked as "spam" and moved to the spam folder.
Could you please check if the user is part of the "render" or "video" group?
Because, the AMDGPU-Pro installation document says that:
"To use the ROCr implementation of OpenCL, the running user might need additional permissions. Usually the user must be added to the “render” group or to the “video” group. See the notes in OpenCL (Optional Component) for more details."
For more information, please refer the section "OpenCL (Optional Component)" in the installation document.
Thanks.