Basically, I am a little bit tiered to write such topics in AMD forum, because nobody read them and even does not know what this about! But I am always was an AMD/ATI fan and I just try to help AMD to improve their CPU! I have a lot of experience in ASM on Z80 CPUs that is why I have a lot of ideas how to improve modern CPU! Because out-of-order execution is not very efficient in modern scientific (calculation intensive) application! I am several times try to contact AMD and tell them about, but this is impossible thing to do! I have no time to write papers in relate journals to prove my concept to developers! I hope AMD processors developer team is very experienced and can understand my ideas and can try to check them for consistency.
1: Can you implement on-chip microcode optimization - I mean - if CPU fetch more than 1 instruction for decoding and build sequence of microcode (without vector path and direct path) after it can analyze this sequence and determinate if it can skip some ops or replace them with more efficient sequences!!!! (it can be done using templates in on-chip ROM for most frequent instruction sequences)!
2: Can you implement SIMD like microcode - I mean - if CPU fetch more than 1 instruction it can use SIMD microcode to decode instruction sequences! For example instruction sequence need to send values or 4 registers into other 4 registers - in modern architecture it will be 4 DirectPath ops, in proposed architecture it will be on SIMD microcode op! It can be realized similar to 1.
University of Exeter.
Your first point already appeared in some AMD patent(s) a while ago. But what would be the reason for such code?
Either the compiler delivered such still optimizeable code (should be handled on the software side simply because this is cheaper, could even be done by a executable binary postprocessor),
or the x86 ISA's limitations are the cause for code, which could be expressed more efficiently using the internal µOp capabilities (think of 3 operand µOps) - which would limit the necessary efforts to some general patterns (like cmp+branch µOp fusion) which appear rather often.
Yes, of course! But I explore a lot of compilers( PGI 8 , GCC 4.3.3, IC 10.1, MSVC++ 2008) and open for myself that they do not optimize code well!
It can be done, but no one doing it!
So, if CPU can done it on MOPs level it will be perfect! x86 is just a GHOST now, Intel and AMD just emulate it in MOPs and thats all! I think that MOPs can be multi-vector (proceed up-to! twenty, operands) and it can greatly improve CPU performance in scientific and media applications!
In addition: AMD must improve CPU architecture, because they need to sell their products and get their money! If CPU is not improved from generation to generation( mean that only frequency rise) then people does not buy them and AMD lose their money!
And I think, that multi-vec MOPs is a right way to improve next-gen AMD CPUs!