I am planning to ship a software using binary kernels with inline asm. Therefore I decided to go with LLVM based offline compile, since the buildin pal compiler can not handle this. Hereby it is important for me to support as many platforms as possible - I would love to do only rocm, because I believe its the more mature platform and the way for the future, but unfortunately I got a lot of Windows clients or clients that connect their cards over PCIE-2 - so for them need to provide PAL builds.
My problem is: LLVM seems to build the kernels for rocm correctly when using
clang-8 -std=CL1.2 -target amdgcn-amd-amdhsa-opencl -mcpu=polaris10 -c -O3 ./kernel.cl
But in the moment I try to build the kernel for pal target with
clang-8 -std=CL1.2 -target amdgcn-amd-amdpal-opencl -mcpu=polaris10 -c -O3 ./kernel.cl
I get confronted with a lot of asm errors
/tmp/kernel-d4a333.s: Assembler messages:
/tmp/kernel-d4a333.s:2: Error: unknown pseudo-op: `.hsa_code_object_isa'
/tmp/kernel-d4a333.s:7: Error: Keine solche Anweisung: »s_getpc_b64 s[36:37]«
/tmp/kernel-d4a333.s:8: Error: Keine solche Anweisung: »s_mov_b32 s36,s0«
/tmp/kernel-d4a333.s:9: Error: Keine solche Anweisung: »s_load_dwordx4 s[36:39],s[36:37],0x0«
Where "Keine solche Anweisung" can be best translated to "No such command" or similar.
If I ask the compiler instead to output LLVM-IR that seems to work, but I am not sure if that result can be consumed as binary kernel (I am myself running on rocm).
Not that this also happens for a kernel that normally would compile in amdpal, so without special instructions or inlined asm. Also the kernel contains "#include "opencl-c.h"" in the beginning - well else it would not compile for rocm.
So I wonder what I am doing wrong or if there is a better way to achieve my goal. Any instruction how to do this appreciated
Edit / Remark: The llvm version was just from the Ubuntu 18.04 repos - just in case this matters