So here is my story: I wrote my code totally optimised using IL the result is, my code is about 50% more slow than huge and 4 times more big cal++ IL output in real ASM system my code will get 30% more performance but as I can see IL is a only trick the final ISA is generated from my code by first optimising it so from experiments I see the point is to not write optimised IL assembly but very messy and huge code which give a better way for assembly it into fully utilized xywzt ISA.
And now question - so how I can get real control on hardware and is any way to optimise my code as I can see generated ISA assembly in many stories doing all in very lazy way by picking up too much "easy" instructions on the start and later leaving only "hard" instructions on the end which with integers consume one clock cycle without utilizing fully x y z w t I think this is some optimisation way copied from GCC cuz ISA assembler the most like and goodly use GCC optimised code from cal++ output.
I know is easy for control how xywzt is utilized with short code but I speak about code which have 800-1200 instructions. Just now for me this is big unknown and more lucky than really coding and optimising when ISA assembler cannot be controled. Also is _prec and _precmask can be used to control how code is optimised on ISA assembler cuz this is documented in verider way.