AnsweredAssumed Answered

OpenCL with SVM extensions on Linux for modern APUs?

Question asked by epvbergen on Sep 16, 2018
Latest reply on Sep 21, 2018 by epvbergen



I'm evaluating OpenCL-accelerated OpenCV on V1807B (Raven Ridge APU) and am wondering what options I have to get SVM support on Linux on APU.


It seems there are multiple approaches:

- a fully open stack: Linux 4.18+ with raven ridge kfd patches, amdgpu, mesa 18.1.6 -> works, but only OpenCL 1.1 (clover), no SVM. No solution.

- the AMDGPU-PRO stack: Linux 4.18+ with raven ridge kfd patches, amdgpu-pro 18.30 -> works, but OpenCL 1.2 without SVM extensions. Slow in OpenCV.

- the official support for V1000: Linux 4.14 with AMDGPU driver 2018.20.818 -> works, but OpenCL 1.2 without SVM extensions. Idem.

- ROCm-based OpenCL -> Raven Ridge not supported in ROCm 1.8, APUs not in roadmap for ROCm, APU support seems to have ended with Carrizo and Kaveri.


If I understand the situation correctly then:

- support for OpenCL 2.0 on Linux has ended with the 2014 release of Catalyst 15.1, before the compiler in AMDGPU-PRO could offer OpenCL 2.0.

- support for OpenCL 1.2 with the SC compiler ended with AMDGPU-PRO 17.50, before the LLVM compiler offered the same performance and correctness (see the reports from the coin miners).

- support for packed FP16 is not planned anymore, see Disappointing opencl half-precision performance on vega - any advice?

- support for ROCm on APU ended with ROCm 1.6, before gfx902/gfx903 (Raven Ridge) was supported, the first mainstream APU in a long time with the Ryzen 2400G et al.


If I want to make a very depressing general summary for Linux, OpenCL goes from 2.0 to 1.2, to 1.2 with problems. Main attention for SVM support goes from AMD to Intel. Heterogenous computing with ROCm goes from APU to dGPU. Packed FP16 is dropped, despite support in the chips and the boost it can give to DL.


(It's such a pity regarding ROCm, everything seems in place, if only someone would update the closed source I even got vector_copy to work by patching ROCr to fake a gfx900 instead of gfx902 to, but alas, ROCm-Tensorflow still crashed half way through).


So what's the plan? I am really enthusiastic about the promise that modern APUs could hold for accelerated DL inference and computer vision, but how do I convince my colleagues to avoid the Jetson and the Movidius to satisfy their appetite for AI-at-the-edge?