I've been trying to figure out way to execute from .isa file of kernel. However, I'm not exactly sure, if there is some connection between .il file and .isa.
I have noted that .isa files are shorter than .il files, and hence represents much better file to tweak with. I initially thought, that intermediate language will be somewhere between ocl and isa codes. Can somebody please help me in understanding the nature of these two files.
Isa is hardware specific code.
Amd_il is an intermediate language which has some hardware specific aspects.
For example if you have a kernel compiled for HD6970 and another for HD7970 the IL files will be pretty much the same (there will be mostly buffer declaration differences and some optimization differences). But the Isa files will be totally different because those cards are two different architectures (namely VLIW4 and GCN).
FYI The complete workflow is OpenCL -> LLVMIR -> AMD_IL -> ISA.
ISA files are shorter because they're heavily optimized. And mostly AMD_IL files are longer because they have lots and lots of code that deals with 16byte<->dword access. (16byte access is natural to the card and dword is natural to opencl.)
Thank you. I will keep this in mind. However, here is an idea
AMD_IL -->tweaking level 1 --> ISA(highly optimized) -->tweaking level 2-->AMD_IL(highly optimized)-->binary
Is there a way, by which we can get this. Since each instruction can be deterministically mapped to its corresponding higher language construct. IF we can do this, then no longer we have to write assembler for specific hardware.
"AMD_IL -->tweaking level 1 --> ISA(highly optimized) -->tweaking level 2-->AMD_IL(highly optimized)-->binary"
I don't understand this.
When you optimize something in OpenCL, you can check it's result is AMD_IL and in ISA form and see what the hw exactly does. And then you can modify your OpenCL program to get a more optimal version of ISA code. You can detect register spilling, VLIW unit under-utilization, too big kernel size, etc.
You can rewrite the whole thing from scratch in lightly hw specific AMD_IL or in strongly hw specific ISA.
"Since each instruction can be deterministically mapped to its corresponding higher language construct."
No, there are many ISA things that cannot be mapped easily to higher levels.
For example there is a SCALAR processor in the new architecture, and from IL you cannot program it directly.
s_load_dwordx16 s[16:31], s[0:1], 0 //this initiates the loads of 64byte constant data from a 64bit memory address into 16 scalar registers.
In IL you just access those 16 constants by cb2.x .. cb2.w. and it depends on the compiler if it will use the compact s_load_dwordx16 or just smaller loads.
Is this possible:
opencl Kernel ------------BuildProgramWithSource--------------> binary
-- kernel.il (generated file) ---- update this file ---------> new binary (.isa part will be updated)
-- kernel.isa (generated file) --- update this file --------> new binary 2 (.il part will be updated)
If elf file supports consistency, then we can extract corresponding .il file for .isa and vice-versa
"-- kernel.il (generated file) ---- update this file ---------> new binary (.isa part will be updated)"
Possible: you have to give an .elf that contains only the .il section. IMO not worth it.
"-- kernel.isa (generated file) --- update this file --------> new binary 2 (.il part will be updated)"
Impossible with BuildProgramWithSource.
And I think the 'generated file' in the case of kernel.isa is a bit misleading: It is a disassembled image.
binary .il part in the inner .elf was redundant if there was an isa microcode in that .elf. And with catalyst 13.4 it got removed from the inner .elf. Now the inner elf contains only the binary machine code kernel.