I use hex editor to modify binary OpenCL kernels to modify some instructions. I've been enable to take advantage of some instructions like MUL_PREV, FLT_TO_INT_FLOOR (available in xyzw slots). There are few more such useful instructions that compiler doesn't use.
I'm looking for some information on how to use GROUP_SEQ_START and GROUP_SEQ_END instructions available on Evergreen architecture:
- Is it available on HD 6850 card?
- Does it require all threads within workgroup to reach it, like with GROUP_BARRIER instruction?
- Does it take any arguments?
- Does it require any additional information in binary file for it to work? Maybe it requires some setup by OpenCL runtime that is not accessible with current API?
- Does it span multiple ALU clauses?
- Where should it be placed: first/last instruction of ALU clause? Or can it be anywhere within a clause? What slot in VLIW it should use?
So far I was unable to make this instruction work. It doesn't cause any crashes and acts as NOP on my video card. I found no information on the web, so I thought writing a post myself. I understand I'm asking for some low level stuff, but since there's no official way of taking full potential of your video card, people have to resort to such hacks, as manually editing binaries.