AMD and Intel are making mutually incompatible instructions and are using different instruction codes for almost identical instructions. This is certainly not what the IT community wants, but it is a consequence of free competition. The two companies are competing to invent new instructions and keeping their plans secret for the sake of competition. The consequence is mutually incompatible instructions. We have seen the two companies assigning different codes to the same instruction, but the worst nightmare is yet to come: assigning different instructions to the same code.
The current situation is very unfortunate for the software industry. Very few software developers are willing to bear the costs of developing, testing and maintaining separate versions of their software for AMD and Intel.
This problem is a consequence of the market situation where each company has to keep its plans secret for reasons of competition. A voluntary peace agreement is unlikely, so the only cure is a legal or political intervention. The initiative for a legal intervention may come from AMD, because the current situation is more advantageous to Intel than to AMD. The best that can come out of such a process is a public standardization committee where new instructions are discussed and approved. A less ambitions outcome would be an agreement about which part of the opcode space each company can use for its innovations.
However, such a legal process could take years, and AMD cannot remain passive in the meantime. I will therefore discuss what AMD could do in the present situation if no peace agreement with Intel can be obtained.
The history in a nutshell:
The situation of SSE5 versus AVX is particularly troublesome. We have two different schemes for coding instructions with more than two operands. These two schemes are mutually incompatible and it would be quite costly in terms of instruction decoding hardware to support both. The AVX scheme is technically superior, as I have argued elsewhere (http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html) so I have no doubt that AVX will win this competition.
AMD will have to revise their SSE5 specification to fit the AVX coding scheme. Call it SSE5R or whatever. Some of the SSE5 instructions can simply be replaced by the almost equivalent instructions in the Intel AVX and FMA instruction sets, but many of the SSE5 instructions have no equivalent Intel instructions - yet.
Here comes the next problem. How can AMD find a vacant bit combination in the AVX scheme without running the risk that Intel has something else in the pipeline using the same code for something else? I have asked in Intel's AVX forum whether there is space reserved for other vendors, but got no answer (http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30257153.aspx).
I have therefore made a list of what AMD could do if Intel refuses to assign part of the AVX code space to AMD:
(1). Use some of the unused bits in the VEX prefix to indicate new AMD instructions. This would be a very dangerous solution. One important feature of the VEX coding scheme is that it is possible to determine the instruction length based on only the VEX prefix and the mod/reg/rm byte. No matter which bit combination AMD chooses there is a possibility that Intel has already assigned the same bit combination to some other instructions with a different length. This would make an incompatibility that it is impossible to solve.
(2). Put a VEX prefix on codes that are already in use by AMD. The 3DNow instructions don't need a VEX prefix because VEX is not allowed on MMX instructions. This frees the following codes for other use:
0E, 0F, 24, 25, 7A, 7B preceded by VEX with mm = 01.
(3). Define a new VEX prefix. The current VEX prefixes begin with C4 and C5. These are the same codes as the old LES and LDS instructions, which are not allowed in 64-bit mode. In 32-bit mode, the distinction between VEX prefix and LES/LDS is based on the two leftmost bits of the subsequent byte, which are 11 if it is a VEX prefix. This bit combination would indicate an illegal register operand on LES/LDS. There is one more byte value that can be used in the same way, namely the hexadecimal value 62. This is the BOUND instruction, which is not allowed in 64-bit mode and cannot have a register operand. The 62 byte value can be used as a VEX prefix for AMD instructions. However, this is the only remaining byte value that has this property. Using this in an unwise and shortsighted way may prevent future extensions. Using 62 as a three-bytes VEX prefix analogously to C4 would not add much to the opcode space. I would prefer to make it a four-bytes VEX prefix. The first byte is 62, the next two bytes should have exactly the same meaning as for the C4 VEX prefix, including the instruction length information. A single bit of the fourth byte should indicate an AMD instruction. You could make a public announcement saying that the part of the opcode space defined by this bit = 1 is AMD territory. Everybody else stay out, unless copying an AMD instruction. The last seven bits are available for future extensions.
(4). If you fear that Intel may have other plans with the 62 byte then there are two other byte values that can be used for VEX prefixes, although this is a little more tricky. These are D4 and D5. These codes are currently assigned to the obsolete instructions AAM and AAD, which are not allowed in 64-bit mode. The distinction between VEX prefix and AAM/AAD in 32-bit mode would still be based on the two leftmost bits of the subsequent byte being 11. The second byte of the AAM and AAD instructions is almost always = 0A (= 10 decimal). This is the radix or number base for packed BCD calculations. Other values are possible, but partly undocumented and almost never used. The AMD manual and a few old Intel manuals tell that other values are possible, while most manuals specify only the value 0A. Other values than 0A are not supported by assemblers and compilers. The only values that make sense when used for radix conversions are in the interval 0x02 - 0x10. The value would have to be bigger than or equal to 0xC0 to interfere with the use as a VEX prefix. It is theoretically possible that some programmer has amused himself with using AAM or AAD for other purposes than they are intended for and with a byte value > 0xC0. This would probably be some old and obscure DOS program.
The probability that such a VEX prefix would break existing software is so low that I would consider it permissible, from a purely technical point of view. However, there is another consideration that cannot be ignored, and that has to do with PR. It is possible that a competitor or a nit-picking IT journalist would claim that the processor might be incompatible with existing software, even if there is no proof that such software exists at all. For this reason, it should be possible to switch off the VEX use in 32-bit mode. For example by a bit in the EFLAGS register.
(5). Same as (4), but available only in 64-bit mode. Assume that high-end users will use 64-bit mode anyway at that time.
Originally posted by: avk They (Intel) thought that they are gods, who need no to ask anybody to do anything.
That's why I think it is necessary to sue them for unfair competition if they refuse to cooperate. The new AVX opcode space is huge, but there is no part of this space that AMD can safely use without permission from Intel. Hitherto, AMD have been able to find obscure places in the opcode map that it was unlikely that Intel would use, but I can't see any such places in the AVX space if the principle of consistent instruction lengths should be upheld.
They have a patent sharing agreement, but as long as they don't patent their innovations they can keep them secret from each other. If they weren't keeping secrets from each other then we wouldn't have the current situation of two mutually incompatible code systems and different codes for identical instructions.
I don't think that they have an agreement about sharing the opcode space in a fair way. AMD wouldn't have crammed all their 3DNow instructions into a single opcode if they had access to a fair share of the opcode space. For SSE4, both Intel and AMD have subdivided the opcode space simply because it is filled up. The VEX code space has plenty of space and AMD should have access to a fair share of this.
Yes, but somebody has to make the compiler. The incompatibility could, in principle, be solved if all compilers had CPU-dispatching capabilities. The compiler would make several versions of your code, one for AMD SSE5, one for Intel AVX, one for older computers with SSE2, one for still older computers without SSE2, and so on. This would make your program very big, and it wouldn't make life easy for the programmer, because you have to set a lot of compiler options and pragmas to tell which versions you want and which parts of your code are so critical that you want to split it in several versions.
The only compiler I know that can do this is Intel's compiler, and it supports CPU-dispatching only for Intel processors. AMD could make a similar compiler and you would have to compile twice with two different compilers and distribute two binaries.
I know no third-party compiler that can do automatic CPU-dispatching between Intel-specific and AMD-specific instructions. I have tried to convince the Gnu people to make CPU-dispatching in the most important standard C library functions so that at least these functions can take advantage of the different instruction sets. They agree that it should be done, but apparently they don't have enough volunteers to do it. And personally, I don't have the time to do it for them. It's a lot of work, you see.
It is simply so expensive to the software industry to support multiple incompatible instruction sets that it is not done. Macintosh supports multiple incompatible CPUs by making several versions of every binary and packing them together into a bundle. Windows and Linux developers could in principle do the same, but nobody is willing to pay the costs of developing, testing and maintaining multiple versions of the software. It would be MUCH MUCH cheaper to put pressure on the two CPU vendors to agree on a common standard. If they can't talk to gether then a political or legal intervention is needed.
The idea of free market/free competition is based on the hypothesis that unrestrained competition between egoistic competitors will produce the best possible product at the lowest possible price. This hypothesis is true in some situations, and false in other situations. Economists use the term "market failure" when competition produces undesired results. For example, competition in the Olympic games produces doping, which is an undesirable result. Market failure can only be prevented through intervention or regulation. I believe that the market for x86 microprocessors fails on the following points:
1. Unfair competition. AMD does not have access to a fair share of the opcode space to use for their innovations. Historically, AMD has used obscure corners of the opcode space to avoid the risk that Intel might assign another instruction to the same code. There is no part of the new VEX opcode space that AMD can safely use without permission from Intel.
2. Technical incompatibility. AMD and Intel are assigning different codes to identical or equivalent instructions because both keep their innovations secret for as long as possible. It is so expensive for the software industry to make two different versions of their software that hardly anybody does so.
3. Short-sighted solutions. The history of the evolution of the x86 instruction set is full of shortsighted patches that are sub-optimal in a long-term perspective. For example, when the vector registers were extended from MMX to XMM, there was no plan for how to handle the predictable future extension to YMM. If such a plan had been made then we wouldn't need the complexity today of having two versions of every XMM instruction, one that zero-extends into the YMM register and one that leaves the rest of the register unchanged. A standardization committee or public discussion forum would be more likely to include long-term planning.
4. Sub-optimal solutions. Some instructions could be implemented better at no extra costs. For example, the PANDN and PALIGNR instructions would be more efficient if the two operands were swapped. A public discussion would have corrected such lapses before it was too late.
5. PR considerations often have more weight than technical considerations. Currently, we have more than a thousand instructions in the x86 instruction set. More than most programmers can remember. It would be better to have fewer instructions and make each instruction more flexible so that it would cover more applications. But there is an obvious PR value in announcing that the newest processor has a bazillion new instructions. The weird names of the instruction set extensions are obviously decided by PR people rather than by engineers.
6. Backwards compatibility is taken too far. Today's microprocessors are still supporting even the most obscure undocumented instructions of the first 8086 processor from thirty years ago, while operating systems sometimes fail to support software that is five years old. There is no technical reason for this, only a PR reason. If vendor X removed support for obsolete instructions then vendor Y would surely advertise that Y is compatible with all existing software, but X is not. The cost of supporting undocumented and obsolete instructions is actually quite high because they take up space in the overcrowded opcode map. If these codes had been eliminated then all instructions in SSSE3 and later instruction sets would have a one-byte escape code rather than a two-bytes escape code.
7. Inability to declare anything obsolete. There are many things in the x86 instruction set that needs to be cleaned up and sanitized, which an unregulated market is unable to do. A standardization committee could declare that standards-compliant software should not use a certain feature. Support for this feature could then be removed after e.g. ten years. For example, the x87 register stack is clearly obsolete. If the standard says, don't use x87 and MMX registers, then we could replace all x87 instructions by emulation after a number of years. It is quite costly in terms of silicon space and performance to support the x87 instructions. Some processors even have an extra stage in the pipeline only for rotating the x87 register stack.
8. Feedback from users is always too late. When a new instruction set is published, there is often public criticism, but then it is too late to change anything. The secrecy around innovations makes it impossible to involve the larger software community in the decision making process.
Originally posted by: yeyang
Self-quoting doesn't make what you said a bit more credible.
I expected my readers to be sufficiently computer-literate to be able to click on a link so I didn't have to write the same long list of arguments again. Please follow the link and read the discussion thread on Aces hardware forum. Here is the link again:
http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html
When AMD published their new ISA extension named SSE5 in late August 2007, they also introduced a new instruction code format for instructions with 3 or 4 operands. When Intel presented their AVX extension in April this year they introduced another code format that also supports 3 or 4 operands. These two formats are very different. We are now in a position where AMD and Intel are using completely different coding schemes for the same instructions.
This is every programmer's nightmare! I cannot imagine any significant number of programmers making three versions of their code: one for AMD, one for Intel, and one for compatibility with older processors.
The forking of instruction sets and coding schemes is one of the less desirable consequences of free competition. We would all prefer some kind of international standardization committee that could approve new instruction codes. Such a committee would be reluctant to accept new shortsighted patches that add just another complication to instruction decoding. They would have weeded out the bizarre undocumented instructions from the old 8086 days that are still supported. And they might not accept the addition of new instructions to the already bulging instruction set mainly for marketing reasons with little technical benefit. Unfortunately, there is little hope that such a committee will be formed.
I have looked into the details of the two competing instruction formats and made a comparison:
* Both ISA extensions are compatible with all existing code.
* SSE5 supports 3 operands for new instructions only. AVX extends existing instructions to 3 operands as well. Almost all existing instructions on XMM registers are extended to 3 operands, and the code format makes room for also extending general-purpose register instructions to 3 operands.
* SSE5 supports instructions with 4 operands, but only if two of the operands are the same register. AVX supports any combination of 4 registers by adding an extra code byte. Future extension to 5 operands is possible.
* SSE5 makes instructions longer. AVX makes some instructions longer and some instructions shorter, but most instructions keep the same length as before despite containing one more register operand and other new information.
* SSE5 adds yet another complication to the already very complicated instruction decoding procedure. AVX makes instruction decoding simpler by sanitizing a lot of old patches. The many prefixes and escape bytes that pester the current instruction set are joined together into a single "VEX" prefix that is 2 or 3 bytes long.
* AVX supports the extension of the 128-bit vector registers (XMM registers) to 256 bits (YMM registers) with room for further extensions in the future. SSE5 has no room for new extensions.
* AVX has 3 unused bits for future extensions to the now overloaded opcode map. This means no new shortsighted patches for a foreseeable future.
Before I saw the AVX documentation, I would have denied that it was possible to add so much new information without making instructions longer. The trick is that it makes one long prefix instead of many short prefixes. One or a few bits in the new VEX prefix contains the same information as a whole 8-bit or even 16-bit prefix or escape code in the current coding scheme. The two VEX prefixes are made out of two obsolete instructions, LDS and LES, which are valid in 16- and 32-bit mode but invalid in 64-bit mode. Certain bits in the VEX prefix that indicate register extensions available only in 64-bit mode are placed in such a way in the VEX prefix that the only values valid in 32-bit mode form an invalid register operand if interpreted as a legacy LDS or LES instruction. This is a solution no less ingenious than the x64 extension invented by AMD.
Looking at the advantages of AVX over SSE5 there can be no doubt that AMD has no choice but to adopt AVX. There is no way AMD can stay in competition without supporting the new 256-bit vectors and the 3-operand version of all existing XMM instructions. And, incidentally, it will be easier to implement the new 3-operand instructions for AMD than it is for Intel because the current Intel microarchitecture does not allow micro-operations with more than two inputs, while the AMD microarchitecture has no such limitation.
Let me explain the advantage of 3-operand instructions to those who don't know what this is about. Most of the current instructions place the result of a calculation in the same register as one of the input operands, e.g.:
A = A * B.
With a 3-operand version, you can do:
C = A * B.
This gives the programmer the freedom to reuse the original value of A in other calculations without having to copy it to another register. The result is fewer register-to-register moves and hence more efficient and compact code.
The SSE5 instructions will suffer the same fate as AMD's 3DNow instructions. Nobody ever used the 3DNow instructions because they are not supported in Intel processors. They are superseded by the more efficient SSE instructions, but AMD have to keep supporting them in all their future processors for the sake of backwards compatibility. Let's hope that AMD have the guts to drop SSE5 altogether before it's too late. There has been some speculation that they might.
Too bad that AMD haven't seen this coming before they published their SSE5 spec. Intel must have been able to keep their plans secret despite the patent sharing agreement between AMD and Intel. Maybe there is no patent on AVX?
I am not talking about which instructions are more useful, I am talking about the way they are encoded. The AVX scheme has plenty of room for future extensions without making instructions longer: The L bit makes room for YMM registers, the three unused mmm bits make space for new opcodes, the two unused pp bits make space for other operand types or still larger registers or whatever, the four unused bits in the immediate byte allow for a fifth operand. The DREX scheme has none of this. The AVX scheme makes non-destructive forms of all XMM instructions. The SSE5 instructions do not have non-destructive forms because two of the operands must be the same register.
A new RISC instruction set? Intel did that first with Itanium - not a big success The market wants compatibility, not on the assembly language level but on the binary level. A new ISA will have to be compatible with existing code. A microprocessor that supports both x86 and RISC86 would probably be accepted by the market, but it might end up running x86 code faster than RISC86 because the CISC code takes less space in the code cache than RISC code.
I don't see the advantage of a new ISA encoding scheme. If you look at the disassembly of a typical program you will see that > 90% of the instructions are very short instructions like push, pop, call, ret, mov, inc, add, cmp, conditional jump, etc. If you make a RISC code for the same instructions, you would need 8 bytes for each instruction. This would increase the code size by approximately a factor 3. And you would be unable to code instructions that have a 64-bit immediate, such as MOV RAX,big_constant. You will see the capacity of your code cache reduced by a factor 3 and hardly any gain in decoding speed. The VEX system simplifies decoding by allowing only certain specific instruction lengths. This is a reasonable compromise between RISC and CISC in my opinion.
I agree that there are a lot of things that could be cleaned up and sanitized if a new mode is introduced for whatever reason. AMD removed the most obviously obsolete instructions when they designed the x64 mode, but there is much more that could be cleaned up. Microsoft tried to ban the old x87 registers in x64 Windows, according to the first preliminary specs, but for some reason they changed their mind and supported x87 in Win64.
Things that could be cleaned up if a new mode is defined:
However, these advantages are not sufficient for justifying yet another CPU mode and yet another software standard. But if a new mode is introduced for some other reason then these changes should be included.
guys need some help..
Install x86_64 on AMD problem
Just built new computer with AMD Athlon XP 2600+Barton on Asus A7N8X-X-UAY motherboard, 512 MB of RAM, and an 80 GB HD. Want to have a dual boot system so I first installed W98. No problems. Used FIPS to shrink windows and created a new partition for Linux. All went well. Downloaded Fedora Core 1 (Yarrow x86_64) ISO and burned onto CD. Booted from CD to install Fedora and got message, "Your CPU does not support long mode. Use a 32 bit distribution." Install will not move forward. Should I install i386? Please help.
That's right, Athlon XP does not support long (64-bit) mode. Therefore, you should to install i386 (32-bit) version of software. If you interested in what exactly your CPU capable of, then you can run CPU-Z utility.
I ran into the same issue as agner. Thanks for the help guys I really do appreciate it. I was about to start pulling hair out.
Originally posted by: backlinkbuilder I ran into the same issue as agner. Thanks for the help guys I really do appreciate it. I was about to start pulling hair out.
This bot make me laugh, to the favorites.
Good idea, Agner! Alas, I think that Intel won't make any steps to create such a public standardization committee.
Originally posted by: sj1009I think that Intel won't make any steps to create such a public standardization committee.
Of course not. The present situation gives Intel a big competitive advantage over AMD. That's why somebody has to put pressure on Intel to cooperate. I don't know if AMD can sue Intel for unfair competition. I think they have tried that already, but I don't know. At least that would be difficult and expensive and take a very long time.
The pressure might come from government organizations, from the EU, from professional organizations like IEEE, from the software industry, from IT journalists, from political organizations, ...
agner: You've said: "from the software industry". I think that it would be a charcoal, which will ignite the petition's bonfire. And you could be the match .
When I've told about an online petition, I've meant www.petitiononline.com, most famous site of that kind. Would you try to start this petiton? I think that first your message at this topic is perfect as the base.
Thanks for the reference to www.petitiononline.com
The issue is very technical. Only people who understand it would sign it. People who don't understand would just say that competition is good, because that's what they have been brainwashed to believe. Well, I don't doubt that the competition between Intel and AMD has been good for price and quality - but not for compatibility.
Do you really think that Intel would change their profitable policy if a few hundred people sign a petition?
I think I do understand your doubt about those people, who would or would not sign this petition. Yes, most of them are brainwashed, but, I hope, not all of them. IMHO, every asm-coder would sign. All you need is to let them know about the real situation.
About Intel's profitable policy: who knows. As you know, people here, in Europe, really don't like monopoly (I'm not sure about the rest of the world), so if you describe in your petition how one company prevents another company to sell last one's products (what about "GenuineIntel" policy, when a program or a library, compiled by Intel compiler, don't "see" on non-Intel CPUs several SSE or all of them, thus running in MMX-mode?), this fact can rise the discussion in the world programming community. Will Intel change its policy? It depends on how large discussion will be.
Originally posted by: avkevery asm-coder would sign.
There are not many asm coders left. A petition in itself wouldn't move anything, but it may help getting media attention. It would definitely help if we could find an IT journalist willing to make a campaign. It would have to be someone from a major PC magazine or an influential blogger. He would press Intel and AMD for a comment. And he might get comments from various software companies. The petition might give him an excuse for keeping writing about it. Now professor so-and-so has signed the petition and he would get a comment from the professor.
Maybe you are right about putting the Intel compiler issue into it. That's something people understand if you simplify it as: "Your software checks if it is running on an Intel machine and punishes you with bad performance if not".
I tried a long time ago to make a journalist write about the Intel compiler issue. He promised to write about it but never did.
Anybody out there know a good IT journalist?
Do you remember that topic? I've posted there a link to Ars Technica's Joel Hruska's article about his CPUID manipulation. What if to invite him?
I have argued for a public standardization of the x86 instruction set at www.agner.org/optimize/blog/read.php?i=25
Comments are welcome
Just a little correction: 3DNow! was introduced by AMD K6-2 at 1998, not 1997.
I believe that your initiative would be more productive in the online petition form .
to avk:
Fully support you. I am propose RISC CPU which have for example 16 pipelines, but execution units, caches and registers can be shared across pipelines! Also I suggest direct instruction instead of Instructuion->MOPs hierarchy ! Instruction can manipulate up to cache line operands and number of operands can vary.
they should have contracts that are not informed hmmmm
_______
Zen
Edit: Removed advertising from post.
Good news: AMD have changed the specifications for the future SSE5 instructions to make them more compatible with Intel's AVX scheme, see: http://support.amd.com/us/Processor_TechDocs/43479.pdf
To avoid confusion, they have also changed the name SSE5 to XOP, FMA4 and CVT16.
Thank you so much, AMD. Please tell us if Bulldozer will support both XOP, FMA4 and CVT16?
Bad news: Intel have recently changed the specs for their FMA instructions, so that the compatibility is lost again.
How ironic:
In the initial preliminary specifications, AMD had 3 different operands on FMA instructions, and Intel had 4 operands.
Now, both companies have revised their specifications: AMD now has 4 operands on FMA instructions and Intel has 3 operands isgust;.
Apparently, Intel are to blame for not informing AMD in time about this change. They certainly knew that AMD planned to make compatible instructions because there have been patent sharing negotiations about this issue.
Maintaining compatibility seems to be a game of running after a moving target, as long as both companies keep secrets for each other rather than cooperate.
Can somebody from AMD please comment on how you will react to Intel's latest change in their FMA spec. Will future AMD processors use 3 or 4 operands on FMA instructions, or support both forms?
As far as I can see, semantic compatibility is moot because these instruction set extensions (from AMD and Intel) are not syntactically compatible anyway.
BTW I don't think you can patent an instruction format (i.e., encoding), but the implementation.
Some of the new AMD instructions are different from anything Intel have, for example the half precision floating point calculations. Some instructions are identical in every way, for example PTEST. Some instructions serve the same need but are slightly different, for example AMDs VPCMOV and Intels BLEND instructions.
AMD changed the coding of their FMA instructions to make them fully compatible with Intels instructions. Unfortunately, Intel changed the specs for their FMA instructions so that compatibility is lost once again.
This situation is intolerable to the software community. We must find a way to standardize the x86 instruction set and force AMD and Intel to cooperate on instruction formats rather than playing tricks on each other for the sake of short term PR gains.
Agner: It seems that AMD has heard you . But, alas, Intel did a dirty trick
.
Originally posted by: avk Agner: It seems that AMD has heard you . But, alas, Intel did a dirty trick
.
Yes, AMD have certainly done what I have argued so heavily for here and elsewhere - thank you very much for that - but I have no idea whether they would have done the same had I not voiced my opinion. I don't know what the motives behind Intel's change was.
All these problems could be avoided if we had a public forum for discussion of new instructions, supported by both Intel and AMD.