OpenCL

epvbergen · ‎09-16-2018

Hi,

I'm evaluating OpenCL-accelerated OpenCV on V1807B (Raven Ridge APU) and am wondering what options I have to get SVM support on Linux on APU.

It seems there are multiple approaches:

- a fully open stack: Linux 4.18+ with raven ridge kfd patches, amdgpu, mesa 18.1.6 -> works, but only OpenCL 1.1 (clover), no SVM. No solution.

- the AMDGPU-PRO stack: Linux 4.18+ with raven ridge kfd patches, amdgpu-pro 18.30 -> works, but OpenCL 1.2 without SVM extensions. Slow in OpenCV.

- the official support for V1000: Linux 4.14 with AMDGPU driver 2018.20.818 -> works, but OpenCL 1.2 without SVM extensions. Idem.

- ROCm-based OpenCL -> Raven Ridge not supported in ROCm 1.8, APUs not in roadmap for ROCm, APU support seems to have ended with Carrizo and Kaveri.

If I understand the situation correctly then:

- support for OpenCL 2.0 on Linux has ended with the 2014 release of Catalyst 15.1, before the compiler in AMDGPU-PRO could offer OpenCL 2.0.

- support for OpenCL 1.2 with the SC compiler ended with AMDGPU-PRO 17.50, before the LLVM compiler offered the same performance and correctness (see the reports from the coin miners).

- support for packed FP16 is not planned anymore, see Disappointing opencl half-precision performance on vega - any advice?

- support for ROCm on APU ended with ROCm 1.6, before gfx902/gfx903 (Raven Ridge) was supported, the first mainstream APU in a long time with the Ryzen 2400G et al.

If I want to make a very depressing general summary for Linux, OpenCL goes from 2.0 to 1.2, to 1.2 with problems. Main attention for SVM support goes from AMD to Intel. Heterogenous computing with ROCm goes from APU to dGPU. Packed FP16 is dropped, despite support in the chips and the boost it can give to DL.

(It's such a pity regarding ROCm, everything seems in place, if only someone would update the closed source libhsa-ext-finalize64.so. I even got vector_copy to work by patching ROCr to fake a gfx900 instead of gfx902 to libhsa-ext-finalize64.so, but alas, ROCm-Tensorflow still crashed half way through).

So what's the plan? I am really enthusiastic about the promise that modern APUs could hold for accelerated DL inference and computer vision, but how do I convince my colleagues to avoid the Jetson and the Movidius to satisfy their appetite for AI-at-the-edge?

(gstoner?)

gstoner · ‎09-17-2018

There were many reasons AMD sunsetted the Catalyst Linux driver, there was a decision by the VP of Engineering and the Corporate Fellows at the time to move common Linux driver core foundation all based on AMDGPU. Which meant we have a multi-year rebuild of the foundation since it was missing capabilities. Including depreciation and regressed, so the team had time to rebuild it. OpenCL 2.0 was always planned to the made whole on Linux for APU & DGPU. Remember Catalyst driver had its challenges with the Linux community and our customers.

On SVM, Catalyst had many shortcomings in its design, it supports Maximum of 4 GB of memory, on DGPU you slice the 4 GB by number of GPU in the System which was a bit of an issue as we built GPU 16 and 32 GB of Memory. One of the thing with ROCm we working was addressing this issue. It was an issue with APU + dGPU combo. We also have few more issue we had to deal with that were architected into Catalyst driver like it only supports 4 GB max allocation of memory, it was not until the last release where they fixed the driver to support Larger allocation by chaining multiple 4GB regions into one virtual larger allocation. There number of other architectural challenges that impacted even what you desire on the APU, but I leave it there.

Over the last two and half year, we had to make some tough priority calls due to the size of the GPU compute engineering team. Looking at the what OpenCL application that was in the market which also ran on Linux. Also the fact the market on POSIX based OS ( Mac OSX and Linux ( aka NVIDIA) ) never advanced beyond Common Denominator of OpenCL 1.2 we work to make sure we delivered this at minimum

- AMDGPUpro is for broad-market support to support all CPU PCIe Gen1, Gen2, Gen3 etc

- So GFX8 and older never moved to ROCm based driver foundation. Also, they never moved to new LLVM compiler they stayed on LLVM/HSAIL/SC compiler the same as Catalyst used in our Windows driver.

- GFX10 aka Vega10 is the only driver that supports the ROCm base driver foundation,

- With 17:50 moved the same compiler as Windows Driver LLVM/HSAIL/SC compiler and

- With 18:20 moved OpenCL on PAL ( Same foundation as Vulkan) with LLVM/HSAIL/SC compiler that Windows Driver uses.

- The ROCm project which primary focus was advanced GPU computing languages, HPC and Machine/Deep Learning. Because of this, ROCm uses more advanced platform feature like PCI Gen PCIe Atomics to support Signals which why we need PCIe Gen3 lanes on the CPU PCIe Root Complex which where Server base GPU is placed

- ROCm AMGPU LLVM compiler supports OpenCL 2.0 Kernel on OpenCL 1.2 runtime today and will support Full OpenCL 2.0 with Packed Math Float16 Operation.

- On ROCm we had strong drive to get to the pure opensource solution, which LLVM/HSAIL/SC compiler was a big issue, plus for this project, it had a number of shortcomings we were trying to address for HPC Deep Learning market with the new compiler. Assembler support was critical for our library programs, You see rocBLAS now hit 94% efficiency on Vega10 for large Square Matrixes on SGEMM and MIOpen it was critical get to performance level with MIOpen on Vega10

As you know, the community of OpenCL adoptors the common Denominator is OpenCL 1.2, not OpenCL 2.0. Only AMD and Intel moved forward here, Which we do have full support Windows for OpenCL 2.0.

Now HSA/ROCm and APU, Due to an early Architecture issue, before I ran the team, they use a particular extension in the SBIOS that extends the SRAT with Topology info into a file called a CRAT. We had a large number of issue due to OEM/ODM not correctly populated this out. It is something the Linux and ROCr team have been revisiting.

I am sorry this impacted your work, but please be patient as we rebuild the core stack to get to the level of capabilities to meet your expectation. We are working to bring OpenCL 2.0 across both AMDGPUpro and ROCm on Linux, but remember it more then SVM, but we have many users who want Device Enqueue, this was feature the team has been working on since it was another feature that did not work well under catalyst.

Also, the team has been working on Raven Support for ROCm it just taken a bit longer to get all the foundation we need in place

A lot of this taken longer then we wanted, but it all coming back with a better foundation. A big thing is the GPU Computing Team and Linux team are now one team which should speed all this up now under a new VP of Engineering. The one thing we should have done better communicate to the community the changes we doing and why earlier.

I will leave you with OpenCL 2.0 full support will release within next 6 months

Thanks

Gregory Stoner

View solution in original post

gstoner · ‎09-17-2018