I think short answer is there is no way to directly control generated ISA.
I can give you only few advices and clear few misunderstandings.
1. On CPU code written in assembler directly converts to binary code executed on CPU ( second generation programming language ). This isn't the case with IL and ISA. For GPU IL is high level programming language. There is no direct translation between ISA and IL.
2. Because IL is high level programming language optimising IL register usage doesn't make sense. IL compiler will pack used (!) registers to hardware ISA registers. So you can create code with 100 IL registers using x component and code with 25 registers using xyzw components and both will use exactly the same number of ISA registers.
3. CAL++ overhead are usually extra mov instruction. IL compiler ( remember it's high level language ) have really no problem removing those. Extra registers used also are of no importance ( point 2 ).
4. CAL++ doesn't use gcc for optimization. The code written in CAL++ is directly emited to IL.
5. I haven't seen kernel which can't be as efficiently written in CAL++. As CAL++ kernel is much more easier to write it really doesn't make sense to use IL directly ( huge waste of time ).
6. Some IL instructions ( for example ddiv ) are converted to many ISA instructions ( ddiv on 4xxx cards give >40 ops ).
7. IL compiler is sometimes really bad/stupid. There are situation where he makes optimisations giving huge increase in ISA registers usage.
8. Usually you can trick IL compiler to generate more efficient code by changing order of instructions.
9. There is standard trick to increase instruction slot usage. Do more work in one software thread. When you do work on 5 elements at the same time you have guaranteed full slot usage. ( Usually it's enough to work on 2-4 ).
I hope it will help you .