cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

userxx
Journeyman III

rx 5700 xt OpenCL freezes - computing and clinfo

Hi,

I am using Centos 7. I was using the driver from amdgpu, but don't remember if pro or not. I could run OpenCL... until the computation is long enough (1-2s), that it freezes... and sometimes the whole machine is freezed, with only solution a hard reset. Awful.

So, I wanted to give a try with new drivers, just in case... and now it is even worst, I cannot list with clinfo nor execute the simplest kernel.

What is going on?

I am trying to update just because the GPU freezes eventually when I run kernels.. and now it cannot even start.

 

 

 

 

 

$ clinfo
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (3224.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx1010
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0
  Driver Version                                  3224.0 (HSA1.1,LC)
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
  Device Topology (AMD)                           PCI-E, 85:00.0
  Max compute units                               20
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                32
  SIMD instruction width (AMD)                    1
  Max clock frequency                             2100MHz
  Graphics IP (AMD)                               10.1
  Device Partition                                (core)
    Max number of sub-devices                     20
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Compiler Available                              Yes
  Linker Available                                Yes
^C
Program received signal SIGINT, Interrupt.
0x00007ffff05a22f1 in rocr::core::InterruptSignal::WaitRelaxed(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
(gdb) bt
#0  0x00007ffff05a22f1 in rocr::core::InterruptSignal::WaitRelaxed(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#1  0x00007ffff05a21aa in rocr::core::InterruptSignal::WaitAcquire(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#2  0x00007ffff0599cf9 in rocr::HSA::hsa_signal_wait_scacquire(hsa_signal_s, hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#3  0x00007ffff057c500 in rocr::AMD::BlitKernel::SubmitLinearCopyCommand(void*, void const*, unsigned long) ()
   from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#4  0x00007ffff059048a in rocr::(anonymous namespace)::RegionMemory::Freeze() ()
   from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#5  0x00007ffff05b5484 in rocr::amd::hsa::loader::Segment::Freeze() [clone .part.0] ()
   from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#6  0x00007ffff05b54f6 in rocr::amd::hsa::loader::ExecutableImpl::Freeze(char const*) ()
   from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#7  0x00007ffff05b4c56 in rocr::amd::hsa::loader::AmdHsaCodeLoader::FreezeExecutable(rocr::amd::hsa::loader::Executable*, char const*) () from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#8  0x00007ffff059c528 in rocr::HSA::hsa_executable_freeze(hsa_executable_s, char const*) ()
   from /opt/amdgpu-pro/lib64/libhsa-runtime64.so.1
#9  0x00007ffff7f6edb8 in roc::LightningProgram::setKernels(amd::option::Options*, void*, unsigned long, int, unsigned long, std::string) () from /opt/amdgpu-pro/lib64/libamdocl64.so
#10 0x00007ffff7f660ac in device::Program::linkImplLC(amd::option::Options*) ()
   from /opt/amdgpu-pro/lib64/libamdocl64.so
#11 0x00007ffff7f66b11 in device::Program::build(std::string const&, char const*, amd::option::Options*) ()
   from /opt/amdgpu-pro/lib64/libamdocl64.so
#12 0x00007ffff7f25759 in amd::Program::build(std::vector<amd::Device*, std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool, bool) () from /opt/amdgpu-pro/lib64/libamdocl64.so
#13 0x00007ffff7efbf34 in clBuildProgram () from /opt/amdgpu-pro/lib64/libamdocl64.so
#14 0x0000000000405a76 in getWGsizes ()
#15 0x0000000000405f6c in device_info_wg ()
#16 0x0000000000406f9e in printDeviceInfo ()
#17 0x0000000000407ba2 in showDevices ()
#18 0x00000000004014dc in main ()

 

 

 

 

 

Freezed in this line.

 

Currently with the version 20.50 (freezing at clinfo or the simplest driver), previously with 20.20 or 20.40 (don't remember), but it freezes with executions (eg. matrix multiplication of 1024x1024 can finish, 2048x2048 cannot finish).

I tried different instalations:

- ./amdgpu-install -y --opencl=legacy,rocr --headles

- ./amdgpu-pro-install -y --opencl=legacy,rocr --headless

- ./amdgpu-pro-install -y --opencl=legacy --headless

- ./amdgpu-pro-install -y --opencl=rocr --headless

example of packages installed with pro and rocr:

 

 

 

 

 

 amdgpu-core                       noarch           20.50-1234664.el7                 @amdgpu-pro-local           0.0
 amdgpu-dkms                       noarch           1:5.9.10.69-1234664.el7           @amdgpu-pro-local           224 M
 amdgpu-dkms-firmware              noarch           1:5.9.10.69-1234664.el7           @amdgpu-pro-local            43 M
 amdgpu-pro-versionlist            noarch           20.50-1234664.el7                 @amdgpu-pro-local           1.8 k
 amdgpu-versionlist                noarch           20.50-1234664.el7                 @amdgpu-pro-local           2.0 k
 amdgpu-pro-core                   noarch           20.50-1234664.el7                 @amdgpu-pro-local           9.3 k
 amdgpu-pro-rocr-opencl            x86_64           20.50-1234664.el7                 @amdgpu-pro-local           9.3 k
 clinfo-amdgpu-pro                 x86_64           20.50-1234664.el7                 @amdgpu-pro-local           179 k
 comgr-amdgpu-pro                  x86_64           1.9.0-1234664.el7                 @amdgpu-pro-local           126 M
 hip-rocr-amdgpu-pro               x86_64           20.50-1234664.el7                 @amdgpu-pro-local           2.3 M
 hsa-runtime-rocr-amdgpu           x86_64           1.2.0-1234664.el7                 @amdgpu-pro-local           2.4 M
 hsakmt-roct-amdgpu                x86_64           1.0.9-1234664.el7                 @amdgpu-pro-local           624 k
 libdrm-amdgpu                     x86_64           1:2.4.100-1234664.el7             @amdgpu-pro-local           230 k
 libdrm-amdgpu-common              noarch           1.0.0-1234664.el7                 @amdgpu-pro-local           8.5 k
 ocl-icd-amdgpu-pro                x86_64           20.50-1234664.el7                 @amdgpu-pro-local            53 k
 opencl-rocr-amdgpu-pro            x86_64           20.50-1234664.el7                 @amdgpu-pro-local           1.7 M

 

 

 

 

 


Example of installation in CentOS 7.9

 

 

 

 

$ ~/a/amdgpu-pro-20.50-1234664-rhel-7.9 ❯❯❯ sudo ./amdgpu-pro-install --headless --opencl=legacy,rocr
[amdgpu-pro-local]
name=AMD amdgpu Pro local repository
baseurl=file:///var/opt/amdgpu-pro-local
enabled=1
gpgcheck=0

...

Total                                                                                   151 MB/s |  85 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : amdgpu-core-20.50-1234664.el7.noarch                                                                1/15
  Installing : amdgpu-pro-core-20.50-1234664.el7.noarch                                                            2/15
  Installing : comgr-amdgpu-pro-1.9.0-1234664.el7.x86_64                                                           3/15
  Installing : ocl-icd-amdgpu-pro-20.50-1234664.el7.x86_64                                                         4/15
  Installing : clinfo-amdgpu-pro-20.50-1234664.el7.x86_64                                                          5/15
  Installing : libdrm-amdgpu-common-1.0.0-1234664.el7.noarch                                                       6/15
  Installing : 1:libdrm-amdgpu-2.4.100-1234664.el7.x86_64                                                          7/15
  Installing : hsakmt-roct-amdgpu-1.0.9-1234664.el7.x86_64                                                         8/15
  Installing : hsa-runtime-rocr-amdgpu-1.2.0-1234664.el7.x86_64                                                    9/15
  Installing : hip-rocr-amdgpu-pro-20.50-1234664.el7.x86_64                                                       10/15
  Installing : opencl-rocr-amdgpu-pro-20.50-1234664.el7.x86_64                                                    11/15
  Installing : 1:amdgpu-dkms-firmware-5.9.10.69-1234664.el7.noarch                                                12/15
  Installing : 1:amdgpu-dkms-5.9.10.69-1234664.el7.noarch                                                         13/15
Loading new amdgpu-5.9.10.69-1234664.el7 DKMS files...
Building for 3.10.0-1160.21.1.el7.x86_64
Building initial module for 3.10.0-1160.21.1.el7.x86_64
Done.
Forcing installation of amdgpu

amdgpu.ko.xz:
Running module version sanity check.
 - Original module
   - Found /lib/modules/3.10.0-1160.21.1.el7.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz
   - Storing in /var/lib/dkms/amdgpu/original_module/3.10.0-1160.21.1.el7.x86_64/x86_64/
   - Archiving for uninstallation purposes
 - Installation
   - Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/

amdttm.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/

amdkcl.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/

amd-sched.ko.xz:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/3.10.0-1160.21.1.el7.x86_64/extra/
Adding any weak-modules

Running the post_install script:

depmod....

DKMS: install completed.
  Installing : amdgpu-pro-rocr-opencl-20.50-1234664.el7.x86_64                                                    14/15
  Installing : opencl-orca-amdgpu-pro-icd-20.50-1234664.el7.x86_64                                                15/15
  Verifying  : comgr-amdgpu-pro-1.9.0-1234664.el7.x86_64                                                           1/15
  Verifying  : 1:amdgpu-dkms-5.9.10.69-1234664.el7.noarch                                                          2/15
  Verifying  : 1:libdrm-amdgpu-2.4.100-1234664.el7.x86_64                                                          3/15
  Verifying  : amdgpu-pro-rocr-opencl-20.50-1234664.el7.x86_64                                                     4/15
  Verifying  : amdgpu-core-20.50-1234664.el7.noarch                                                                5/15
  Verifying  : libdrm-amdgpu-common-1.0.0-1234664.el7.noarch                                                       6/15
  Verifying  : hip-rocr-amdgpu-pro-20.50-1234664.el7.x86_64                                                        7/15
  Verifying  : opencl-orca-amdgpu-pro-icd-20.50-1234664.el7.x86_64                                                 8/15
  Verifying  : ocl-icd-amdgpu-pro-20.50-1234664.el7.x86_64                                                         9/15
  Verifying  : clinfo-amdgpu-pro-20.50-1234664.el7.x86_64                                                         10/15
  Verifying  : 1:amdgpu-dkms-firmware-5.9.10.69-1234664.el7.noarch                                                11/15
  Verifying  : hsa-runtime-rocr-amdgpu-1.2.0-1234664.el7.x86_64                                                   12/15
  Verifying  : amdgpu-pro-core-20.50-1234664.el7.noarch                                                           13/15
  Verifying  : opencl-rocr-amdgpu-pro-20.50-1234664.el7.x86_64                                                    14/15
  Verifying  : hsakmt-roct-amdgpu-1.0.9-1234664.el7.x86_64                                                        15/15

Installed:
  amdgpu-dkms.noarch 1:5.9.10.69-1234664.el7             amdgpu-pro-rocr-opencl.x86_64 0:20.50-1234664.el7
  clinfo-amdgpu-pro.x86_64 0:20.50-1234664.el7           opencl-orca-amdgpu-pro-icd.x86_64 0:20.50-1234664.el7

Dependency Installed:
  amdgpu-core.noarch 0:20.50-1234664.el7                    amdgpu-dkms-firmware.noarch 1:5.9.10.69-1234664.el7
  amdgpu-pro-core.noarch 0:20.50-1234664.el7                comgr-amdgpu-pro.x86_64 0:1.9.0-1234664.el7
  hip-rocr-amdgpu-pro.x86_64 0:20.50-1234664.el7            hsa-runtime-rocr-amdgpu.x86_64 0:1.2.0-1234664.el7
  hsakmt-roct-amdgpu.x86_64 0:1.0.9-1234664.el7             libdrm-amdgpu.x86_64 1:2.4.100-1234664.el7
  libdrm-amdgpu-common.noarch 0:1.0.0-1234664.el7           ocl-icd-amdgpu-pro.x86_64 0:20.50-1234664.el7
  opencl-rocr-amdgpu-pro.x86_64 0:20.50-1234664.el7

Complete!

 

 

 

 

0 Likes
1 Reply
dipak
Big Boss

Hi @userxx 

Thank you for reporting it. It seems like the post was wrongly marked as "spam" and moved to the spam folder.

Could you please check if the user is part of the "render" or "video" group?

Because, the AMDGPU-Pro installation document says that:

"To use the ROCr implementation of OpenCL, the running user might need additional permissions. Usually the user must be added to the “render” group or to the “video” group. See the notes in OpenCL (Optional Component) for more details."

For more information, please refer the section "OpenCL (Optional Component)" in the installation document.

 

Thanks.

0 Likes