Your first point already appeared in some AMD patent(s) a while ago. But what would be the reason for such code?
Either the compiler delivered such still optimizeable code (should be handled on the software side simply because this is cheaper, could even be done by a executable binary postprocessor),
or the x86 ISA's limitations are the cause for code, which could be expressed more efficiently using the internal µOp capabilities (think of 3 operand µOps) - which would limit the necessary efforts to some general patterns (like cmp+branch µOp fusion) which appear rather often.
Yes, of course! But I explore a lot of compilers( PGI 8 , GCC 4.3.3, IC 10.1, MSVC++ 2008) and open for myself that they do not optimize code well!
It can be done, but no one doing it!
So, if CPU can done it on MOPs level it will be perfect! x86 is just a GHOST now, Intel and AMD just emulate it in MOPs and thats all! I think that MOPs can be multi-vector (proceed up-to! twenty, operands) and it can greatly improve CPU performance in scientific and media applications!
In addition: AMD must improve CPU architecture, because they need to sell their products and get their money! If CPU is not improved from generation to generation( mean that only frequency rise) then people does not buy them and AMD lose their money!
And I think, that multi-vec MOPs is a right way to improve next-gen AMD CPUs!