Some suggestions to improve an IPC on the K10+

Discussion created by avk on Sep 2, 2008
Latest reply on Sep 9, 2009 by avk
Perhaps, some of them are already exist in K10-45nm...

Well, of course, I'm not a one of those guys who architect chips . But reading the K10 optimization manual (#40546), especially Appendix C "Instruction Latencies", I have thought that several instructions could enlarge its throughput, if AMD will (slightly?) improve the FSTORE unit:

1) Almost all the "MOVxxx xmmreg1, xmmreg2" forms like: MOVSS/D, MOVLHPS/D, MOVHLPS/D, MOVSLDUP, MOVSHDUP. The most important instructions here are MOVSS/D, which are frequently used in a MSVC-generated code.

2) Next target is a data shuffling instructions (PACKxxxx, UNPCKxxxx, xSHUFxxx). I'm not quiet sure about a difficulty of implementation of these instructions into the FSTORE, but I think that it is somewhat easier than the whole FADD.

3) Last target is a logical 128-bit operations (xANDx, xORx, etc). Arguments are the same as in 2).