I am trying to perform some modifications to compiled kernels at LLVM bytecode level. I want to use KernelAnalyzer to measure performance impact of my modificationes. My workflow is basically:
1) Dump the kernel binary (BIF).
2) Extract the LLVM bytecode from the binary.
3) Modify the bytecode.
4) Reassemble a new BIF with only .llvmir section, having the modified bytecode.
5) clCreateProgramFromBinary and clBuildProgram, then dump binary again. Then extract the .amdil section and analize kernel with KernelAnalyzer.
If I use the originally extracted LLVM bytecode, everything works with my reconstructed BIF and step 5 recreates a full binary (including .text and .amdil sections as well as .llvmir). But even the slightest modification to the LLVM bytecode causes step 5 to fail (does not complain, but .text and .amdil sections are not created). I understand is LLVM bytecode of kernels has some restrictions, but seems so strange that even minor modifications break that restrictions.
Is there any doc/clue on how LLVM bytecode of compiled kernels has to look like? Remember I am not creating one from scract but parting from a correct one, so all the required globals (sgv, fgv and friends) are in place.
Thanks a lot,
Carlos