Welcome back, you are white listed. I have moved this question into the OpenCL forum.
Look up at SunsetQuest/Asm4GCN · GitHub , balidani/gcnasm · GitHub (Linux but limited features) or HetPas | Assembler & IDE for Radeon gfx cards (only Win) or my work (still is not finished): CLRadeonExtender (disassembler and assembler for Linux, supports GalliumCompute, support GCN1.2).
Currently, only Asm4GCN can support inline assembly (i didn't test this stuff).
I was working on GCN assembler over couple months, now I am finishing and polishing my stuff. Very likely fully-fledged assembler (from my stuff) will be available in next month.
Hello, I just wanted to chime in about "inline" support for Asm4GCN. Currently, inline assembly (inside a OpenCL kernel) is not supported in Asm4GCN. It was a goal of mine but I could not get it working reliably - maybe someday.
Asm4GCN will let you mix an OpenCL Kernel a GCN kernel in the same CLProgram. That may help. One limitation however is that only one GCN kernel is supported - this is a todo item.
By the way, PTX in Cuda is not true assembly it is an IL (intermediate language). SASS is NVidia's assembly language but there is no SASS compiler available. So technically only AMD GPUs have true hardware assembly access.
I do not know of any programs that support inline injected assembly for GCN. (inline SASS is also not supported on NVidia.)
I forgot about one: development version of the GalliumCompute (with dev LLVM) handles an inline assembly, but that feature is in experimental stage.
GalliumCompute works only under the Linux systems on the OpenSource Mesa3D drivers. Installation that stuff is bit cumbersome on the regular Linux distros (requires deinstallation standard packages and installation from sources). I didn't find too much free time to test this very interesting feature. Nevertheless, I will try to test that feature and I will write about it.
More info about that at: AMD's GPU LLVM Backend Gains Support For Assembler & Inline Assembly - Phoronix