1 of 1 people found this helpful
These instructions require hardware setup outside of the binary that is not accessible. So while your modification for certain instructions will work, instructions that have external dependencies will not work.
For the instructions you are having to modify the binary for, what cases do you need to expose them? If you can provide examples, we can fix the compiler so it generates them correctly.
I've just tried few more cases with GROUP_SEQ_BEGIN/END and it seems it worked, but not in a way I was hoping. Documentation has some conflicting statements. One says that each work item will run in sequence, the other one that each wavefront in a workgroup will run sequentially. It seems that the second one is likely to be correct, though I can't be sure, because as you said, there's some hardware setup is needed.
As for instructions I modify: compiler seems to never generate ADD_PREV, MUL_PREV, MULADD_PREV. There's nothing magic about these instructions, but they help with ALU packing. There's also some float->int and int->float conversion instructions that can go into xyzw slots, allowing for 4 conversions per VLIW instruction, instead of 1.
Also compiler doesn't seem to generate code that uses destination register modifiers like: ADD_SAT, MUL_SAT, MULADD/2, MULADD*2, etc.
It also seems that compiler always generates SET_?? and PREDE_INT/PREDNE_INT pair of instructions instead of directly using PRED_SET?? instruction.
Some time ago I wrote about a problem with read_image function using SAMPLE instruction and doing some math with coordinates, instead of LD then it's possible.