There are some examples of inline assembly inside .cl file:
gatelessgate/equihash.cl at master · zawawawa/gatelessgate · GitHub
But I cannot find a way how they can be compiled.
There is some guide:
HIP/samples/2_Cookbook/10_inline_asm at master · ROCm-Developer-Tools/HIP · GitHub
But it's not clear how exactly to compile .cl files with inline asm.
There was some discussion about inline asm in OpenCL:
But I think that's obsolete.
P.S. I use amdgpu pro driver. All the articles I found are talking about rocm develop tools. Can I compile with rocm compiler and run it with amdgpu pro driver?
Also, if I can transform opencl with inline asm into isa file and then compile isa with pure asm compiler, it will also work for me.
I just found this:
It seems that you can just export env that points to rocm's opencl and build your project as usuall. I wonder if that can automatically handle inline asm
After some experimenting, it seems that it's difficult to find an opencl driver that can compile kernel with inline assembly on the fly. You need to use offline compilers. One option is to use llvm. But when I run it I found that functions like `amd_bitallign` or `amdfe` are not supported by llvm clang. Any suggestions?
dipak Thanks.
Now I am trying to build OpenCL kernel binary with llvm. I successfully compiled .cl into assembly, but cannot figure out a way to compile that format of assembly into binary that can run with AMDGPU pro driver. That format is not compatible with CLRadeonExtender assembler so it seems I need to use llvm's own tool to build the binary. How can I do so. Either a method to compile that format of assembly into binary, or a method to compile OpenCL file into binary, will help me.
Adding matszpk ,who developed CLRadeonExtender, if he can suggest you anything in this regard.
Thanks.
Very likely, LLVM compiles '.cl' file into the ROCm format (with HSA kernel header). The CLRadeonExtender can assemble files into a ROCm binary format, however it is using own directives (pseudo-ops) to describe ROCm metadata, therefore an assembly source generated by LLVM should be rewritten to CLRadeonExtender assembly source. the ROCm binary can be loaded by ROCm OpenCL platform, but not by AMDOCL (ORCA or PAL) OpenCL platform. Moreover, the ROCm OpenCL uses slightly different a call convention than AMDOCL and in this aspect assembly code should be changed. AMDOCL and ROCm uses own metadata format which is incompatible with other. Look up at ROCm-ComputeABI-Doc/AMDGPU-ABI.md at master · ROCm-Developer-Tools/ROCm-ComputeABI-Doc · GitHub, User Guide for AMDGPU Backend — LLVM 8 documentation and CLRadeonExtender CLRXDocs to learn about binary formats.
Thanks matszpk. I wrote a script to convert the metadata in the asm file compiled by llvm into clrxasm format. Is converting metadata sufficient? I mean do I need to modify the .text code, like changing how the arguments are passed into one kernel? Or I can just reuse the asm code generated by llvm? (for sure some code need small changes, such as changing "offset" to "instr_offset").