AMD and Intel are making mutually incompatible instructions and are using different instruction codes for almost identical instructions. This is certainly not what the IT community wants, but it is a consequence of free competition. The two companies are competing to invent new instructions and keeping their plans secret for the sake of competition. The consequence is mutually incompatible instructions. We have seen the two companies assigning different codes to the same instruction, but the worst nightmare is yet to come: assigning different instructions to the same code.
The current situation is very unfortunate for the software industry. Very few software developers are willing to bear the costs of developing, testing and maintaining separate versions of their software for AMD and Intel.
This problem is a consequence of the market situation where each company has to keep its plans secret for reasons of competition. A voluntary peace agreement is unlikely, so the only cure is a legal or political intervention. The initiative for a legal intervention may come from AMD, because the current situation is more advantageous to Intel than to AMD. The best that can come out of such a process is a public standardization committee where new instructions are discussed and approved. A less ambitions outcome would be an agreement about which part of the opcode space each company can use for its innovations.
However, such a legal process could take years, and AMD cannot remain passive in the meantime. I will therefore discuss what AMD could do in the present situation if no peace agreement with Intel can be obtained.
The history in a nutshell:
The situation of SSE5 versus AVX is particularly troublesome. We have two different schemes for coding instructions with more than two operands. These two schemes are mutually incompatible and it would be quite costly in terms of instruction decoding hardware to support both. The AVX scheme is technically superior, as I have argued elsewhere (http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html) so I have no doubt that AVX will win this competition.
AMD will have to revise their SSE5 specification to fit the AVX coding scheme. Call it SSE5R or whatever. Some of the SSE5 instructions can simply be replaced by the almost equivalent instructions in the Intel AVX and FMA instruction sets, but many of the SSE5 instructions have no equivalent Intel instructions - yet.
Here comes the next problem. How can AMD find a vacant bit combination in the AVX scheme without running the risk that Intel has something else in the pipeline using the same code for something else? I have asked in Intel's AVX forum whether there is space reserved for other vendors, but got no answer (http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30257153.aspx).
I have therefore made a list of what AMD could do if Intel refuses to assign part of the AVX code space to AMD:
(1). Use some of the unused bits in the VEX prefix to indicate new AMD instructions. This would be a very dangerous solution. One important feature of the VEX coding scheme is that it is possible to determine the instruction length based on only the VEX prefix and the mod/reg/rm byte. No matter which bit combination AMD chooses there is a possibility that Intel has already assigned the same bit combination to some other instructions with a different length. This would make an incompatibility that it is impossible to solve.
(2). Put a VEX prefix on codes that are already in use by AMD. The 3DNow instructions don't need a VEX prefix because VEX is not allowed on MMX instructions. This frees the following codes for other use:
0E, 0F, 24, 25, 7A, 7B preceded by VEX with mm = 01.
(3). Define a new VEX prefix. The current VEX prefixes begin with C4 and C5. These are the same codes as the old LES and LDS instructions, which are not allowed in 64-bit mode. In 32-bit mode, the distinction between VEX prefix and LES/LDS is based on the two leftmost bits of the subsequent byte, which are 11 if it is a VEX prefix. This bit combination would indicate an illegal register operand on LES/LDS. There is one more byte value that can be used in the same way, namely the hexadecimal value 62. This is the BOUND instruction, which is not allowed in 64-bit mode and cannot have a register operand. The 62 byte value can be used as a VEX prefix for AMD instructions. However, this is the only remaining byte value that has this property. Using this in an unwise and shortsighted way may prevent future extensions. Using 62 as a three-bytes VEX prefix analogously to C4 would not add much to the opcode space. I would prefer to make it a four-bytes VEX prefix. The first byte is 62, the next two bytes should have exactly the same meaning as for the C4 VEX prefix, including the instruction length information. A single bit of the fourth byte should indicate an AMD instruction. You could make a public announcement saying that the part of the opcode space defined by this bit = 1 is AMD territory. Everybody else stay out, unless copying an AMD instruction. The last seven bits are available for future extensions.
(4). If you fear that Intel may have other plans with the 62 byte then there are two other byte values that can be used for VEX prefixes, although this is a little more tricky. These are D4 and D5. These codes are currently assigned to the obsolete instructions AAM and AAD, which are not allowed in 64-bit mode. The distinction between VEX prefix and AAM/AAD in 32-bit mode would still be based on the two leftmost bits of the subsequent byte being 11. The second byte of the AAM and AAD instructions is almost always = 0A (= 10 decimal). This is the radix or number base for packed BCD calculations. Other values are possible, but partly undocumented and almost never used. The AMD manual and a few old Intel manuals tell that other values are possible, while most manuals specify only the value 0A. Other values than 0A are not supported by assemblers and compilers. The only values that make sense when used for radix conversions are in the interval 0x02 - 0x10. The value would have to be bigger than or equal to 0xC0 to interfere with the use as a VEX prefix. It is theoretically possible that some programmer has amused himself with using AAM or AAD for other purposes than they are intended for and with a byte value > 0xC0. This would probably be some old and obscure DOS program.
The probability that such a VEX prefix would break existing software is so low that I would consider it permissible, from a purely technical point of view. However, there is another consideration that cannot be ignored, and that has to do with PR. It is possible that a competitor or a nit-picking IT journalist would claim that the processor might be incompatible with existing software, even if there is no proof that such software exists at all. For this reason, it should be possible to switch off the VEX use in 32-bit mode. For example by a bit in the EFLAGS register.
(5). Same as (4), but available only in 64-bit mode. Assume that high-end users will use 64-bit mode anyway at that time.
Originally posted by: avk They (Intel) thought that they are gods, who need no to ask anybody to do anything.
That's why I think it is necessary to sue them for unfair competition if they refuse to cooperate. The new AVX opcode space is huge, but there is no part of this space that AMD can safely use without permission from Intel. Hitherto, AMD have been able to find obscure places in the opcode map that it was unlikely that Intel would use, but I can't see any such places in the AVX space if the principle of consistent instruction lengths should be upheld.
They have a patent sharing agreement, but as long as they don't patent their innovations they can keep them secret from each other. If they weren't keeping secrets from each other then we wouldn't have the current situation of two mutually incompatible code systems and different codes for identical instructions.
I don't think that they have an agreement about sharing the opcode space in a fair way. AMD wouldn't have crammed all their 3DNow instructions into a single opcode if they had access to a fair share of the opcode space. For SSE4, both Intel and AMD have subdivided the opcode space simply because it is filled up. The VEX code space has plenty of space and AMD should have access to a fair share of this.
Yes, but somebody has to make the compiler. The incompatibility could, in principle, be solved if all compilers had CPU-dispatching capabilities. The compiler would make several versions of your code, one for AMD SSE5, one for Intel AVX, one for older computers with SSE2, one for still older computers without SSE2, and so on. This would make your program very big, and it wouldn't make life easy for the programmer, because you have to set a lot of compiler options and pragmas to tell which versions you want and which parts of your code are so critical that you want to split it in several versions.
The only compiler I know that can do this is Intel's compiler, and it supports CPU-dispatching only for Intel processors. AMD could make a similar compiler and you would have to compile twice with two different compilers and distribute two binaries.
I know no third-party compiler that can do automatic CPU-dispatching between Intel-specific and AMD-specific instructions. I have tried to convince the Gnu people to make CPU-dispatching in the most important standard C library functions so that at least these functions can take advantage of the different instruction sets. They agree that it should be done, but apparently they don't have enough volunteers to do it. And personally, I don't have the time to do it for them. It's a lot of work, you see.
It is simply so expensive to the software industry to support multiple incompatible instruction sets that it is not done. Macintosh supports multiple incompatible CPUs by making several versions of every binary and packing them together into a bundle. Windows and Linux developers could in principle do the same, but nobody is willing to pay the costs of developing, testing and maintaining multiple versions of the software. It would be MUCH MUCH cheaper to put pressure on the two CPU vendors to agree on a common standard. If they can't talk to gether then a political or legal intervention is needed.
The idea of free market/free competition is based on the hypothesis that unrestrained competition between egoistic competitors will produce the best possible product at the lowest possible price. This hypothesis is true in some situations, and false in other situations. Economists use the term "market failure" when competition produces undesired results. For example, competition in the Olympic games produces doping, which is an undesirable result. Market failure can only be prevented through intervention or regulation. I believe that the market for x86 microprocessors fails on the following points:
1. Unfair competition. AMD does not have access to a fair share of the opcode space to use for their innovations. Historically, AMD has used obscure corners of the opcode space to avoid the risk that Intel might assign another instruction to the same code. There is no part of the new VEX opcode space that AMD can safely use without permission from Intel.
2. Technical incompatibility. AMD and Intel are assigning different codes to identical or equivalent instructions because both keep their innovations secret for as long as possible. It is so expensive for the software industry to make two different versions of their software that hardly anybody does so.
3. Short-sighted solutions. The history of the evolution of the x86 instruction set is full of shortsighted patches that are sub-optimal in a long-term perspective. For example, when the vector registers were extended from MMX to XMM, there was no plan for how to handle the predictable future extension to YMM. If such a plan had been made then we wouldn't need the complexity today of having two versions of every XMM instruction, one that zero-extends into the YMM register and one that leaves the rest of the register unchanged. A standardization committee or public discussion forum would be more likely to include long-term planning.
4. Sub-optimal solutions. Some instructions could be implemented better at no extra costs. For example, the PANDN and PALIGNR instructions would be more efficient if the two operands were swapped. A public discussion would have corrected such lapses before it was too late.
5. PR considerations often have more weight than technical considerations. Currently, we have more than a thousand instructions in the x86 instruction set. More than most programmers can remember. It would be better to have fewer instructions and make each instruction more flexible so that it would cover more applications. But there is an obvious PR value in announcing that the newest processor has a bazillion new instructions. The weird names of the instruction set extensions are obviously decided by PR people rather than by engineers.
6. Backwards compatibility is taken too far. Today's microprocessors are still supporting even the most obscure undocumented instructions of the first 8086 processor from thirty years ago, while operating systems sometimes fail to support software that is five years old. There is no technical reason for this, only a PR reason. If vendor X removed support for obsolete instructions then vendor Y would surely advertise that Y is compatible with all existing software, but X is not. The cost of supporting undocumented and obsolete instructions is actually quite high because they take up space in the overcrowded opcode map. If these codes had been eliminated then all instructions in SSSE3 and later instruction sets would have a one-byte escape code rather than a two-bytes escape code.
7. Inability to declare anything obsolete. There are many things in the x86 instruction set that needs to be cleaned up and sanitized, which an unregulated market is unable to do. A standardization committee could declare that standards-compliant software should not use a certain feature. Support for this feature could then be removed after e.g. ten years. For example, the x87 register stack is clearly obsolete. If the standard says, don't use x87 and MMX registers, then we could replace all x87 instructions by emulation after a number of years. It is quite costly in terms of silicon space and performance to support the x87 instructions. Some processors even have an extra stage in the pipeline only for rotating the x87 register stack.
8. Feedback from users is always too late. When a new instruction set is published, there is often public criticism, but then it is too late to change anything. The secrecy around innovations makes it impossible to involve the larger software community in the decision making process.