cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Wedge009
Adept II

Re: amdgpu-pro 20.45: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

A minor update.

I discovered that a user with Navi 10 GPU is apparently running BOINC okay with ROCr-based OpenCL from amdgpu-pro 20.45.

https://einsteinathome.org/goto/comment/184228

The difference between that set-up and mine is that the user has Navi 10 instead of Vega 20 GPU. I doubt it's relevant, but they are also running a mixed set-up, with both ROCr and 'legacy' OpenCL installed.

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

The aforementioned user didn't appear to have success with ROCr-based OpenCL after all.

I have just tested with Ubuntu kernel 5.8.0-45 with amdgpu-pro 20.50 and there's no apparent change with ROCr-based OpenCL. I gather the 20.50 release was focused more on supporting the newly-released Radeon 6700 XT anyway.

Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx906" by: Advanced Micro Devices, Inc.
Max allocation limit: 14360458035
Global mem size: 17163091968
OpenCL device has FP64 support
...
Warning:  Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).
...
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx906+sram-ecc" by: Advanced Micro Devices, Inc.
Max allocation limit: 14588628168
Global mem size: 17163091968
Couldn't create OpenCL command queue (error: -6)!
OpenCL shutdown complete!
initialize_ocl returned error [2013]
OCL context null
OCL queue null
Error generating generic FFT context object [5]

To clarify, this is what is reported with ROCr-based OpenCL:

OpenCL: AMD/ATI GPU 0: Vega 20 [Radeon VII] (driver version 3224.0 (HSA1.1,LC), device version OpenCL 2.0, 16368MB, 16368MB available, 13832 GFLOPS peak)
OpenCL: AMD/ATI GPU 1: Vega 20 [Radeon VII] (driver version 3224.0 (HSA1.1,LC), device version OpenCL 2.0, 16368MB, 16368MB available, 13832 GFLOPS peak)

I've since reverted to kernel 5.4.0-54 with amdgpu-pro 20.40 in order to get PAL-based OpenCL back:

OpenCL: AMD/ATI GPU 0: AMD Radeon VII (driver version 3180.7 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (3180.7), 16368MB, 16368MB available, 13832 GFLOPS peak)
OpenCL: AMD/ATI GPU 1: AMD Radeon VII (driver version 3180.7 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (3180.7), 16368MB, 16368MB available, 13832 GFLOPS peak)

Is there no possibility of bringing back PAL-based OpenCL, even as a 'legacy' option?

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

Reporting no change in situation with recent amdgpu-pro 21.10 release: ROCr-based OpenCL is still not compatible with BOINC-based GPU processing. Tested on Ubuntu kernel 5.8.0-50.

Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

Reporting for the recently released amdgpu-pro 21.20. A bit of a surprise, since the release notes show the main change was only to add support for RHEL and SLED. The application I'm running is not crashing at start of a GPU-based task as it did for amdgpu-pro 20.45 through to 21.10 (inclusive), where ROCr-based OpenCL was enforced.

However, it was too early to celebrate as it appears that instead of crashing the application was just stuck in the initialisation phase of a job. With the old PAL-based OpenCL a single job would complete in round 3-4 minutes. After 35 minutes, the task didn't progress past initialisation at all so I gave up and reverted to amdgpu-pro 20.40 yet again.

Still not at a stage where I could consider ROCr a functional replacement for PAL-based OpenCL, but I suppose being stuck is a marginal improvement from immediate crashing. Or, depending on the perspective of responsiveness a halt could be considered worse than an immediate crash.

Tested on Ubuntu 20.04.2 hwe kernel 5.8.0-55.

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

Reporting no apparent difference with amdgpu-pro 21.30 (which only appears to be updated for Ubuntu 20.04.3). That is, ROCr-based OpenCL still stalls (though not crashes) any attempts to use OpenCL under BOINC.

Still reverting to amdgpu-pro 20.40 as the last PAL-based OpenCL package.

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

At long last I managed to get BOINC running with ROCm (instead of amdgpu-pro). Specifically ROCm 4.3 on Vega10 with Ubuntu kernel 5.11.0-27.

In order to get clinfo (and BOINC) to recognise the GPU I had to manually edit /etc/OpenCL/vendors/amdocl64_40300.icd to contain the absolute path of /opt/rocm/opencl/lib/libamdocl64.so (setting LD_LIBRARY_PATH=/opt/rocm/opencl/lib was not sufficient).

Unfortunately, my experience seems to match what I get with the ROCr-based OpenCL in amdgpu-pro 21.20 and 21.30: stalled execution, no actual GPU usage. I've seen reports that ROCm works okay on Polaris GPUs, but I don't have any of those...

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

I recently received information of a user who has successfully got BOINC working. Same Threadripper 3960X CPU, same Radeon VII GPU, difference is they are using Arch Linux with self-compiled ROCm 4.3.1 while I'm using Ubuntu without success with amdgpu-pro 21.30 or official ROCm 4.3.0 packages. I've yet to get any feedback on how they got their ROCm set-up working but I wonder if there's an issue somewhere in the Ubuntu packaging for amdgpu-pro and/or ROCm...

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

What a pleasant surprise. I got ROCr-based OpenCL from amdgpu-install 21.40.1 running successfully on one host with Vega10, no hacks or work-arounds needed. All I did was remove amdgpu-pro 20.40, update to current HWE kernel, and run

amdgpu-install --opencl=rocr

(with reboots in between).

Hopefully that bodes well for my other two hosts (one with Vega10, the other with Vega20) still stuck on kernel 5.4. I'll be sure to mark this as solved once I get around to updating them, if successful.

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

It turns out that the struggles I was having with Vega20 was apparently due to a hardware fault with one the cards - it seems it's not reliable but neither is it 100% broken. Unfortunate for me, but fortunate in terms of the software support situation. Closing this as resolved with release of amdgpu 21.40.1.40501.

0 Likes
Reply
Wedge009
Adept II

Re: amdgpu-pro 20.45+: ROCr vs PAL OpenCL breaks BOINC GPU processing

Jump to solution

It turns out that the struggles I was having with Vega20 was apparently due to a hardware fault with one the cards - it seems it's not reliable but neither is it 100% broken. Unfortunate for me, but fortunate in terms of the software support situation. Closing this as resolved with release of amdgpu 21.40.1.40501.

(This post kept being blocked as 'spam' for some reason - something wrong with the forum's filtering?)

(Attempting to post again for the umpteenth time...)