Archives Discussions

agner · ‎08-06-2008

AMD and Intel are making mutually incompatible instructions and are using different instruction codes for almost identical instructions. This is certainly not what the IT community wants, but it is a consequence of free competition. The two companies are competing to invent new instructions and keeping their plans secret for the sake of competition. The consequence is mutually incompatible instructions. We have seen the two companies assigning different codes to the same instruction, but the worst nightmare is yet to come: assigning different instructions to the same code.

The current situation is very unfortunate for the software industry. Very few software developers are willing to bear the costs of developing, testing and maintaining separate versions of their software for AMD and Intel.

This problem is a consequence of the market situation where each company has to keep its plans secret for reasons of competition. A voluntary peace agreement is unlikely, so the only cure is a legal or political intervention. The initiative for a legal intervention may come from AMD, because the current situation is more advantageous to Intel than to AMD. The best that can come out of such a process is a public standardization committee where new instructions are discussed and approved. A less ambitions outcome would be an agreement about which part of the opcode space each company can use for its innovations.

However, such a legal process could take years, and AMD cannot remain passive in the meantime. I will therefore discuss what AMD could do in the present situation if no peace agreement with Intel can be obtained.

The history in a nutshell:

AMD invented 3DNow, Intel invented SSE, Intel won. AMD had to copy SSE.
AMD invented x64, Intel invented IA64, AMD won. Intel copied x64.
AMD invented SSE5, Intel invented AVX, Intel won. AMD will have to copy AVX.

The situation of SSE5 versus AVX is particularly troublesome. We have two different schemes for coding instructions with more than two operands. These two schemes are mutually incompatible and it would be quite costly in terms of instruction decoding hardware to support both. The AVX scheme is technically superior, as I have argued elsewhere (http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html) so I have no doubt that AVX will win this competition.

AMD will have to revise their SSE5 specification to fit the AVX coding scheme. Call it SSE5R or whatever. Some of the SSE5 instructions can simply be replaced by the almost equivalent instructions in the Intel AVX and FMA instruction sets, but many of the SSE5 instructions have no equivalent Intel instructions - yet.

Here comes the next problem. How can AMD find a vacant bit combination in the AVX scheme without running the risk that Intel has something else in the pipeline using the same code for something else? I have asked in Intel's AVX forum whether there is space reserved for other vendors, but got no answer (http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30257153.aspx).

I have therefore made a list of what AMD could do if Intel refuses to assign part of the AVX code space to AMD:

(1). Use some of the unused bits in the VEX prefix to indicate new AMD instructions. This would be a very dangerous solution. One important feature of the VEX coding scheme is that it is possible to determine the instruction length based on only the VEX prefix and the mod/reg/rm byte. No matter which bit combination AMD chooses there is a possibility that Intel has already assigned the same bit combination to some other instructions with a different length. This would make an incompatibility that it is impossible to solve.

(2). Put a VEX prefix on codes that are already in use by AMD. The 3DNow instructions don't need a VEX prefix because VEX is not allowed on MMX instructions. This frees the following codes for other use:

0E, 0F, 24, 25, 7A, 7B preceded by VEX with mm = 01.

(3). Define a new VEX prefix. The current VEX prefixes begin with C4 and C5. These are the same codes as the old LES and LDS instructions, which are not allowed in 64-bit mode. In 32-bit mode, the distinction between VEX prefix and LES/LDS is based on the two leftmost bits of the subsequent byte, which are 11 if it is a VEX prefix. This bit combination would indicate an illegal register operand on LES/LDS. There is one more byte value that can be used in the same way, namely the hexadecimal value 62. This is the BOUND instruction, which is not allowed in 64-bit mode and cannot have a register operand. The 62 byte value can be used as a VEX prefix for AMD instructions. However, this is the only remaining byte value that has this property. Using this in an unwise and shortsighted way may prevent future extensions. Using 62 as a three-bytes VEX prefix analogously to C4 would not add much to the opcode space. I would prefer to make it a four-bytes VEX prefix. The first byte is 62, the next two bytes should have exactly the same meaning as for the C4 VEX prefix, including the instruction length information. A single bit of the fourth byte should indicate an AMD instruction. You could make a public announcement saying that the part of the opcode space defined by this bit = 1 is AMD territory. Everybody else stay out, unless copying an AMD instruction. The last seven bits are available for future extensions.

(4). If you fear that Intel may have other plans with the 62 byte then there are two other byte values that can be used for VEX prefixes, although this is a little more tricky. These are D4 and D5. These codes are currently assigned to the obsolete instructions AAM and AAD, which are not allowed in 64-bit mode. The distinction between VEX prefix and AAM/AAD in 32-bit mode would still be based on the two leftmost bits of the subsequent byte being 11. The second byte of the AAM and AAD instructions is almost always = 0A (= 10 decimal). This is the radix or number base for packed BCD calculations. Other values are possible, but partly undocumented and almost never used. The AMD manual and a few old Intel manuals tell that other values are possible, while most manuals specify only the value 0A. Other values than 0A are not supported by assemblers and compilers. The only values that make sense when used for radix conversions are in the interval 0x02 - 0x10. The value would have to be bigger than or equal to 0xC0 to interfere with the use as a VEX prefix. It is theoretically possible that some programmer has amused himself with using AAM or AAD for other purposes than they are intended for and with a byte value > 0xC0. This would probably be some old and obscure DOS program.

The probability that such a VEX prefix would break existing software is so low that I would consider it permissible, from a purely technical point of view. However, there is another consideration that cannot be ignored, and that has to do with PR. It is possible that a competitor or a nit-picking IT journalist would claim that the processor might be incompatible with existing software, even if there is no proof that such software exists at all. For this reason, it should be possible to switch off the VEX use in 32-bit mode. For example by a bit in the EFLAGS register.

(5). Same as (4), but available only in 64-bit mode. Assume that high-end users will use 64-bit mode anyway at that time.

avk · ‎08-06-2008

Good idea, Agner! Alas, I think that Intel won't make any steps to create such a public standardization committee. They (Intel) thought that they are gods, who need no to ask anybody to do anything. BTW, I don't think that the situation will be so bad (different instructions to the same opcode), because I can see that this is not happen yet.
IMHO, SSE5 vs. AVX&FMA is the similar situation as 3DNow! vs. SSE, i.e. AMD solution was released earlier than Intel one. Of course, it is not good to have similar instruction sets for the same work, but it is rough reality.

agner · ‎08-07-2008

Originally posted by: avk They (Intel) thought that they are gods, who need no to ask anybody to do anything.

That's why I think it is necessary to sue them for unfair competition if they refuse to cooperate. The new AVX opcode space is huge, but there is no part of this space that AMD can safely use without permission from Intel. Hitherto, AMD have been able to find obscure places in the opcode map that it was unlikely that Intel would use, but I can't see any such places in the AVX space if the principle of consistent instruction lengths should be upheld.

avk · ‎08-07-2008

Well, I believe that some sort of agreement does exist between AMD and Intel, and we (usual people) just know nothing about it.

agner · ‎08-08-2008

They have a patent sharing agreement, but as long as they don't patent their innovations they can keep them secret from each other. If they weren't keeping secrets from each other then we wouldn't have the current situation of two mutually incompatible code systems and different codes for identical instructions.

I don't think that they have an agreement about sharing the opcode space in a fair way. AMD wouldn't have crammed all their 3DNow instructions into a single opcode if they had access to a fair share of the opcode space. For SSE4, both Intel and AMD have subdivided the opcode space simply because it is filled up. The VEX code space has plenty of space and AMD should have access to a fair share of this.

ryta1203 · ‎08-08-2008

Shouldn't this incompatibility be transparent to the developer through the compiler? Isn't it the compiler engineer's problem to deal with this type of thing? Why would you need to look at the hex?

Or am I misunderstanding your point?

agner · ‎08-09-2008

Yes, but somebody has to make the compiler. The incompatibility could, in principle, be solved if all compilers had CPU-dispatching capabilities. The compiler would make several versions of your code, one for AMD SSE5, one for Intel AVX, one for older computers with SSE2, one for still older computers without SSE2, and so on. This would make your program very big, and it wouldn't make life easy for the programmer, because you have to set a lot of compiler options and pragmas to tell which versions you want and which parts of your code are so critical that you want to split it in several versions.

The only compiler I know that can do this is Intel's compiler, and it supports CPU-dispatching only for Intel processors. AMD could make a similar compiler and you would have to compile twice with two different compilers and distribute two binaries.

I know no third-party compiler that can do automatic CPU-dispatching between Intel-specific and AMD-specific instructions. I have tried to convince the Gnu people to make CPU-dispatching in the most important standard C library functions so that at least these functions can take advantage of the different instruction sets. They agree that it should be done, but apparently they don't have enough volunteers to do it. And personally, I don't have the time to do it for them. It's a lot of work, you see.

It is simply so expensive to the software industry to support multiple incompatible instruction sets that it is not done. Macintosh supports multiple incompatible CPUs by making several versions of every binary and packing them together into a bundle. Windows and Linux developers could in principle do the same, but nobody is willing to pay the costs of developing, testing and maintaining multiple versions of the software. It would be MUCH MUCH cheaper to put pressure on the two CPU vendors to agree on a common standard. If they can't talk to gether then a political or legal intervention is needed.

ryta1203 · ‎08-13-2008

Wouldn't a "political or legal intervention" undermine the idea of free market/free competition? Also, I believe that this "forced standardization" would inhibit possible develop of future, better technologies. This is, of course, JMO.

agner · ‎08-14-2008

The idea of free market/free competition is based on the hypothesis that unrestrained competition between egoistic competitors will produce the best possible product at the lowest possible price. This hypothesis is true in some situations, and false in other situations. Economists use the term "market failure" when competition produces undesired results. For example, competition in the Olympic games produces doping, which is an undesirable result. Market failure can only be prevented through intervention or regulation. I believe that the market for x86 microprocessors fails on the following points:

1. Unfair competition. AMD does not have access to a fair share of the opcode space to use for their innovations. Historically, AMD has used obscure corners of the opcode space to avoid the risk that Intel might assign another instruction to the same code. There is no part of the new VEX opcode space that AMD can safely use without permission from Intel.

2. Technical incompatibility. AMD and Intel are assigning different codes to identical or equivalent instructions because both keep their innovations secret for as long as possible. It is so expensive for the software industry to make two different versions of their software that hardly anybody does so.

3. Short-sighted solutions. The history of the evolution of the x86 instruction set is full of shortsighted patches that are sub-optimal in a long-term perspective. For example, when the vector registers were extended from MMX to XMM, there was no plan for how to handle the predictable future extension to YMM. If such a plan had been made then we wouldn't need the complexity today of having two versions of every XMM instruction, one that zero-extends into the YMM register and one that leaves the rest of the register unchanged. A standardization committee or public discussion forum would be more likely to include long-term planning.

4. Sub-optimal solutions. Some instructions could be implemented better at no extra costs. For example, the PANDN and PALIGNR instructions would be more efficient if the two operands were swapped. A public discussion would have corrected such lapses before it was too late.

5. PR considerations often have more weight than technical considerations. Currently, we have more than a thousand instructions in the x86 instruction set. More than most programmers can remember. It would be better to have fewer instructions and make each instruction more flexible so that it would cover more applications. But there is an obvious PR value in announcing that the newest processor has a bazillion new instructions. The weird names of the instruction set extensions are obviously decided by PR people rather than by engineers.

6. Backwards compatibility is taken too far. Today's microprocessors are still supporting even the most obscure undocumented instructions of the first 8086 processor from thirty years ago, while operating systems sometimes fail to support software that is five years old. There is no technical reason for this, only a PR reason. If vendor X removed support for obsolete instructions then vendor Y would surely advertise that Y is compatible with all existing software, but X is not. The cost of supporting undocumented and obsolete instructions is actually quite high because they take up space in the overcrowded opcode map. If these codes had been eliminated then all instructions in SSSE3 and later instruction sets would have a one-byte escape code rather than a two-bytes escape code.

7. Inability to declare anything obsolete. There are many things in the x86 instruction set that needs to be cleaned up and sanitized, which an unregulated market is unable to do. A standardization committee could declare that standards-compliant software should not use a certain feature. Support for this feature could then be removed after e.g. ten years. For example, the x87 register stack is clearly obsolete. If the standard says, don't use x87 and MMX registers, then we could replace all x87 instructions by emulation after a number of years. It is quite costly in terms of silicon space and performance to support the x87 instructions. Some processors even have an extra stage in the pipeline only for rotating the x87 register stack.

8. Feedback from users is always too late. When a new instruction set is published, there is often public criticism, but then it is too late to change anything. The secrecy around innovations makes it impossible to involve the larger software community in the decision making process.

yeyang · ‎08-20-2008

Who says AVX is superior than SSE5? Self-quoting doesn't make what you said a bit more credible.

In terms of instruction encoding, SSE5 is an addition to current SSEx. AVX is a replacement of current SSEx. In terms of instruction semantics, SSE5 instructions seem more generic while AVX instructions seem more specialized. I think we can still see the shadows of RISC and CISC in these two, respecitvely.

In any rate, the design philosophy of SSE5 and AVX are very different, even though functionally instructions from the two have some overlaps. This is unlike 3DNow to SSE, IMHO.

agner · ‎08-21-2008

Originally posted by: yeyang

Self-quoting doesn't make what you said a bit more credible.

I expected my readers to be sufficiently computer-literate to be able to click on a link so I didn't have to write the same long list of arguments again. Please follow the link and read the discussion thread on Aces hardware forum. Here is the link again:

http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html

yeyang · ‎08-22-2008

The thing is your arguments there are not correct and I see no reason that anyone should take your words for granted. Besides, isn't copy-and-paste some basic computer skill? It's so easy and I'll do it for you here. I'll even comment on them in-line.

When AMD published their new ISA extension named SSE5 in late August 2007, they also introduced a new instruction code format for instructions with 3 or 4 operands. When Intel presented their AVX extension in April this year they introduced another code format that also supports 3 or 4 operands. These two formats are very different. We are now in a position where AMD and Intel are using completely different coding schemes for the same instructions.

Actually it is Intel using completely different coding scheme for AVX, which with the new scheme completely replaces the instruction formats of SSE - SSE4.2. In contrast AMD's SSE5 simply adds a DREX to the instructions and leaves the format of SSE - SSE4a intact.

This is every programmer's nightmare! I cannot imagine any significant number of programmers making three versions of their code: one for AMD, one for Intel, and one for compatibility with older processors.

In any case this is not programmer's problem because compilers should take care of that.

The forking of instruction sets and coding schemes is one of the less desirable consequences of free competition. We would all prefer some kind of international standardization committee that could approve new instruction codes. Such a committee would be reluctant to accept new shortsighted patches that add just another complication to instruction decoding. They would have weeded out the bizarre undocumented instructions from the old 8086 days that are still supported. And they might not accept the addition of new instructions to the already bulging instruction set mainly for marketing reasons with little technical benefit. Unfortunately, there is little hope that such a committee will be formed.

You first say we should do something, then say that something has little hope. I don't know what's your logic or what are you talking about???

I have looked into the details of the two competing instruction formats and made a comparison:

* Both ISA extensions are compatible with all existing code.

* SSE5 supports 3 operands for new instructions only. AVX extends existing instructions to 3 operands as well. Almost all existing instructions on XMM registers are extended to 3 operands, and the code format makes room for also extending general-purpose register instructions to 3 operands.

* SSE5 supports instructions with 4 operands, but only if two of the operands are the same register. AVX supports any combination of 4 registers by adding an extra code byte. Future extension to 5 operands is possible.

I see no reason that later SSE5.x can't add support for 3-operand or 4-operand to other SSE media instructions by using the immediate byte like AVX does. The only reason that it doesn't seems to be that current SSE5 attempts to make minimum changes to existing instruction formats.

* SSE5 makes instructions longer. AVX makes some instructions longer and some instructions shorter, but most instructions keep the same length as before despite containing one more register operand and other new information.

SSE5 does not make instructions longer. SSE5 instructions are as long as SSE3 instructions (2-byte prefix plus 1-byte opcode). AVX makes some SSE3/SSE4 instructions 1-byte shorter under 64-bit mode because it absorbs the functionality of REX. In all other cases the AVX format makes instructions longer.

* SSE5 adds yet another complication to the already very complicated instruction decoding procedure. AVX makes instruction decoding simpler by sanitizing a lot of old patches. The many prefixes and escape bytes that pester the current instruction set are joined together into a single "VEX" prefix that is 2 or 3 bytes long.

This is completely false. SSE5 is so simple that it can be described in 2 pages. AVX is so complex that the even smalls things like register names and formats are different for different operating modes or instructions. Some AVX instructions can use both 2 and 3 byte VEX, some can use only 3 byte VEX. Sometimes a register in AVX is taken 1's compliment and sometimes it is not. Sometimes at some position AVX can specify a memory argument and sometimes it cannot. Sorry but there is no "sanitation" at all but deliberate complication.

* AVX supports the extension of the 128-bit vector registers (XMM registers) to 256 bits (YMM registers) with room for further extensions in the future. SSE5 has no room for new extensions.

* AVX has 3 unused bits for future extensions to the now overloaded opcode map. This means no new shortsighted patches for a foreseeable future.

I don't see why SSE5 can't have room for further extension. There are unused opcodes and unused prefix available. If AMD so wanted they can even recycle opcodes from 3DNow for SSE5.x. Besides, the merit of an ISA extension is not in how much it can be extended, but how useful its extensions are.

Before I saw the AVX documentation, I would have denied that it was possible to add so much new information without making instructions longer. The trick is that it makes one long prefix instead of many short prefixes. One or a few bits in the new VEX prefix contains the same information as a whole 8-bit or even 16-bit prefix or escape code in the current coding scheme. The two VEX prefixes are made out of two obsolete instructions, LDS and LES, which are valid in 16- and 32-bit mode but invalid in 64-bit mode. Certain bits in the VEX prefix that indicate register extensions available only in 64-bit mode are placed in such a way in the VEX prefix that the only values valid in 32-bit mode form an invalid register operand if interpreted as a legacy LDS or LES instruction. This is a solution no less ingenious than the x64 extension invented by AMD.

VEX is basically a way for Intel to say "sorry we messed up the instruction from MMX to SSE to SSE4.1 and SSE4.2, now we're going to fix them up by messing up the instruction format a bit more." The problem is not how the instructions are cramped into a 3-byte word, but how the instructions have overlapping and specialized functionality among them.

To add insult to injury (from the prospect of ISA quality), there is no "different formats of identical instructions" in AVX and SSE5 (because instructions in these two are different), but there are different formats of identical instructions in AVX itself alone. "Ingenious," indeed. Like AMD64, NO.

Looking at the advantages of AVX over SSE5 there can be no doubt that AMD has no choice but to adopt AVX. There is no way AMD can stay in competition without supporting the new 256-bit vectors and the 3-operand version of all existing XMM instructions. And, incidentally, it will be easier to implement the new 3-operand instructions for AMD than it is for Intel because the current Intel microarchitecture does not allow micro-operations with more than two inputs, while the AMD microarchitecture has no such limitation.

The only "advantage" of AVX over SSE5 is 256-bit registers and 4-operand operations. However, AVX also has less powerful compare/permutation instructions, but more semantic and syntactic restrictions on the use of its instructions than SSE5. Furthermore, it will probably be easier for AMD to implement SSE5 than for Intel to implement AVX. I seriously doubt that Intel made AVX so complicated to ensure that nobody (else) can implement it easily. A good SSE5 and SSEplus implementation will be easier to use and cheaper to implement than AVX.

Let me explain the advantage of 3-operand instructions to those who don't know what this is about. Most of the current instructions place the result of a calculation in the same register as one of the input operands, e.g.:
A = A * B.
With a 3-operand version, you can do:
C = A * B.
This gives the programmer the freedom to reuse the original value of A in other calculations without having to copy it to another register. The result is fewer register-to-register moves and hence more efficient and compact code.

The SSE5 instructions will suffer the same fate as AMD's 3DNow instructions. Nobody ever used the 3DNow instructions because they are not supported in Intel processors. They are superseded by the more efficient SSE instructions, but AMD have to keep supporting them in all their future processors for the sake of backwards compatibility. Let's hope that AMD have the guts to drop SSE5 altogether before it's too late. There has been some speculation that they might.

Too bad that AMD haven't seen this coming before they published their SSE5 spec. Intel must have been able to keep their plans secret despite the patent sharing agreement between AMD and Intel. Maybe there is no patent on AVX?

I don't think you understand the difference between SSE5 and AVX, and that between SSE5 and 3DNow. Had you actually studied it and understood it, you'd have found that SSE5 instructions are very generic. They are applicable to a wide range of situations. AVX instructions OTOH are quite specific and has operation requirements that make you wonder where did those come from. In terms of ISA design SSE5 is clearly better than AVX.

As for 3DNow, it was not comparable to the depth or width of SSE5 at all. There were less than 20 instructions in 3DNow and they are all focused on floating point operations. In contrast SSE5 offers 80 some new instructions (not including different operating modes of the same opcode) covering all kinds generic operations. The number of new instructions offered by SSE5 is even more than that offered by AVX. Your "claim" has been totally contrary to reality, and thus I don't believe your "prediction" bears any credibility either.

agner · ‎08-23-2008

I am not talking about which instructions are more useful, I am talking about the way they are encoded. The AVX scheme has plenty of room for future extensions without making instructions longer: The L bit makes room for YMM registers, the three unused mmm bits make space for new opcodes, the two unused pp bits make space for other operand types or still larger registers or whatever, the four unused bits in the immediate byte allow for a fifth operand. The DREX scheme has none of this. The AVX scheme makes non-destructive forms of all XMM instructions. The SSE5 instructions do not have non-destructive forms because two of the operands must be the same register.

avk · ‎09-02-2008

The only way to persuade a big bad boy Intel is to do something like what AMD did with its own x86-64. I think that it will be reasonably to develop a RISC-like ISA (let's name it, say, RISC86

), compatible with x86-64 at instruction mnemonics, but not compatible at the binary code. If all the general players on the x86-market (both software and hardware ones: Microsoft, GNU/Linux creators, AMD, VIA/Centaur, nVidia, quite not sure about Intel) will support this new RISC86 ISA by creating a public standardization committee, it woud be an ideal situation. In this case many problems can be solved.

agner · ‎09-03-2008

A new RISC instruction set? Intel did that first with Itanium - not a big success The market wants compatibility, not on the assembly language level but on the binary level. A new ISA will have to be compatible with existing code. A microprocessor that supports both x86 and RISC86 would probably be accepted by the market, but it might end up running x86 code faster than RISC86 because the CISC code takes less space in the code cache than RISC code.

avk · ‎09-03-2008

Well, speaking of RISC86 I have meant just another new operating mode plus to three already existing: x86-16, x86-32, x86-64... and RISC86

. All of us know that every of three existing x86-modes (16, 32, 64 bits) has its own decode rules (I mean a binary code), so why not to start from the scratch and eliminate (almost?) all the x86-problems in the fourth new RISC86-mode?:

1) a different length of instructions;
2) an ugly x87 instructions;
3) an ugly instruction encoding with lots of escape prefices;
4) a brake a.k.a rFLAGS which is significally slows down instruction execution;
5) no more than two operands for the most instructions;
6) what else

?

About Itanium: I would like to say that Intel, this BBB (big bad boy), does want a lots of money for its IA64, but AMD doesn't for its x86-64 at all. That's the huge difference. Did you watch the Guy Ritchie's "Revolver" movie? If yes, do you remember what Lord John did say?: "But there is no angel as destructive as their greed, in the end she gets them all." I would like to repeat another phrase (not from the movie

😞 "World needs open standards." Do you remember which are they

?

About instruction cache capacity: I'm not quiet sure that, in general, CISC code will be much more compact than RISC86 one. Maybe slightly.

agner · ‎09-03-2008

I don't see the advantage of a new ISA encoding scheme. If you look at the disassembly of a typical program you will see that > 90% of the instructions are very short instructions like push, pop, call, ret, mov, inc, add, cmp, conditional jump, etc. If you make a RISC code for the same instructions, you would need 8 bytes for each instruction. This would increase the code size by approximately a factor 3. And you would be unable to code instructions that have a 64-bit immediate, such as MOV RAX,big_constant. You will see the capacity of your code cache reduced by a factor 3 and hardly any gain in decoding speed. The VEX system simplifies decoding by allowing only certain specific instruction lengths. This is a reasonable compromise between RISC and CISC in my opinion.

I agree that there are a lot of things that could be cleaned up and sanitized if a new mode is introduced for whatever reason. AMD removed the most obviously obsolete instructions when they designed the x64 mode, but there is much more that could be cleaned up. Microsoft tried to ban the old x87 registers in x64 Windows, according to the first preliminary specs, but for some reason they changed their mind and supported x87 in Win64.

Things that could be cleaned up if a new mode is defined:

Get rid of x87 and mmx
Most of the instructions that write to partial registers, e.g. MOV AX,BX; SETE AL; or partial flags, e.g. INC; should clear the rest of the register or flags to remove false dependences.
The single-byte short form of XCHG instructions should be replaced by the two-byte general form because they are very rarely used. I have never seen XCHG instructions in compiler-generated code. This would free the bytes 0x91-0x97 in the opcode map. If these bytes were reclaimed as VEX prefixes, we would have three more opcode bits without making instructions longer.
Win64 ABI could use a revision. Linux64 ABI is more efficient.

However, these advantages are not sufficient for justifying yet another CPU mode and yet another software standard. But if a new mode is introduced for some other reason then these changes should be included.

avk · ‎09-04-2008

Yes, it is well known that more than 90% of x86-instructions in most applications has very short forms, but I don't think that their RISC86 equivalents will need 8 bytes. I think 4 bytes will be enough. If RISC86 will provide more than 2 operands for most instructions, this will reduce the quantity of instructions to do the same work, so a size of code will shorten too.

About VEX: it is a palliative, IMHO.

About instructions with immediate operands: although the minimum size of RISC86-instruction will be 4 bytes, but I don't meant that it will be the maximum. Yes, it is not a plain RISC, but who care? Why not to occupy just 2 bits in the 32-bit RISC86-instruction to define its full length (2 pow 2 = 4 values: 0 = 1 dword, 1 = 2 dwords, 2 = 3 dwords, 3 = 4 dwords)? Yes, instructions will have a various lengths, but using this two bits it will be very simply to recognize it. So, an immediate and/or a displacement will lie in the next 32/64-bits cells after instruction itself. I must admit that I'm not quiet sure for need of these 2 bits - maybe CPU architects will find another method to calculate instruction length.

Pmoll · ‎04-27-2009

guys need some help..

Install x86_64 on AMD problem

Just built new computer with AMD Athlon XP 2600+Barton on Asus A7N8X-X-UAY motherboard, 512 MB of RAM, and an 80 GB HD. Want to have a dual boot system so I first installed W98. No problems. Used FIPS to shrink windows and created a new partition for Linux. All went well. Downloaded Fedora Core 1 (Yarrow x86_64) ISO and burned onto CD. Booted from CD to install Fedora and got message, "Your CPU does not support long mode. Use a 32 bit distribution." Install will not move forward. Should I install i386? Please help.

avk · ‎04-27-2009

That's right, Athlon XP does not support long (64-bit) mode. Therefore, you should to install i386 (32-bit) version of software. If you interested in what exactly your CPU capable of, then you can run CPU-Z utility.

backlinkbuilder · ‎05-18-2009

I ran into the same issue as agner. Thanks for the help guys I really do appreciate it. I was about to start pulling hair out.

eduardoschardong · ‎05-19-2009

Originally posted by: backlinkbuilder I ran into the same issue as agner. Thanks for the help guys I really do appreciate it. I was about to start pulling hair out.

This bot make me laugh, to the favorites.

sj1009 · ‎06-02-2009

Good idea, Agner! Alas, I think that Intel won't make any steps to create such a public standardization committee.

sj1009 · ‎06-02-2009

Good idea, Agner! Alas, I think that Intel won't make any steps to create such a public standardization committee.

Suja

--------

Edit: Removed advertising from post

agner · ‎06-02-2009

Originally posted by: sj1009I think that Intel won't make any steps to create such a public standardization committee.

Of course not. The present situation gives Intel a big competitive advantage over AMD. That's why somebody has to put pressure on Intel to cooperate. I don't know if AMD can sue Intel for unfair competition. I think they have tried that already, but I don't know. At least that would be difficult and expensive and take a very long time.

The pressure might come from government organizations, from the EU, from professional organizations like IEEE, from the software industry, from IT journalists, from political organizations, ...

avk · ‎06-03-2009

agner: You've said: "from the software industry". I think that it would be a charcoal, which will ignite the petition's bonfire. And you could be the match .

When I've told about an online petition, I've meant www.petitiononline.com, most famous site of that kind. Would you try to start this petiton? I think that first your message at this topic is perfect as the base.

agner · ‎06-04-2009

Thanks for the reference to www.petitiononline.com

The issue is very technical. Only people who understand it would sign it. People who don't understand would just say that competition is good, because that's what they have been brainwashed to believe. Well, I don't doubt that the competition between Intel and AMD has been good for price and quality - but not for compatibility.

Do you really think that Intel would change their profitable policy if a few hundred people sign a petition?

avk · ‎06-05-2009

I think I do understand your doubt about those people, who would or would not sign this petition. Yes, most of them are brainwashed, but, I hope, not all of them. IMHO, every asm-coder would sign. All you need is to let them know about the real situation.

About Intel's profitable policy: who knows. As you know, people here, in Europe, really don't like monopoly (I'm not sure about the rest of the world), so if you describe in your petition how one company prevents another company to sell last one's products (what about "GenuineIntel" policy, when a program or a library, compiled by Intel compiler, don't "see" on non-Intel CPUs several SSE or all of them, thus running in MMX-mode?), this fact can rise the discussion in the world programming community. Will Intel change its policy? It depends on how large discussion will be.

agner · ‎06-05-2009

Originally posted by: avkevery asm-coder would sign.

There are not many asm coders left. A petition in itself wouldn't move anything, but it may help getting media attention. It would definitely help if we could find an IT journalist willing to make a campaign. It would have to be someone from a major PC magazine or an influential blogger. He would press Intel and AMD for a comment. And he might get comments from various software companies. The petition might give him an excuse for keeping writing about it. Now professor so-and-so has signed the petition and he would get a comment from the professor.

Maybe you are right about putting the Intel compiler issue into it. That's something people understand if you simplify it as: "Your software checks if it is running on an Intel machine and punishes you with bad performance if not".

I tried a long time ago to make a journalist write about the Intel compiler issue. He promised to write about it but never did.

Anybody out there know a good IT journalist?

avk · ‎06-05-2009

Do you remember that topic? I've posted there a link to Ars Technica's Joel Hruska's article about his CPUID manipulation. What if to invite him?

agner · ‎12-05-2009

I have argued for a public standardization of the x86 instruction set at www.agner.org/optimize/blog/read.php?i=25

Comments are welcome

avk · ‎12-07-2009

Just a little correction: 3DNow! was introduced by AMD K6-2 at 1998, not 1997.

avk · ‎12-12-2009

I believe that your initiative would be more productive in the online petition form .

godsic · ‎04-27-2009

to avk:

Fully support you. I am propose RISC CPU which have for example 16 pipelines, but execution units, caches and registers can be shared across pipelines! Also I suggest direct instruction instead of Instructuion->MOPs hierarchy ! Instruction can manipulate up to cache line operands and number of operands can vary.

zenie · ‎05-01-2009

they should have contracts that are not informed hmmmm

_______

Zen

Edit: Removed advertising from post.

agner · ‎05-02-2009

Good news: AMD have changed the specifications for the future SSE5 instructions to make them more compatible with Intel's AVX scheme, see: http://support.amd.com/us/Processor_TechDocs/43479.pdf

To avoid confusion, they have also changed the name SSE5 to XOP, FMA4 and CVT16.

Thank you so much, AMD. Please tell us if Bulldozer will support both XOP, FMA4 and CVT16?

Bad news: Intel have recently changed the specs for their FMA instructions, so that the compatibility is lost again.

How ironic:

In the initial preliminary specifications, AMD had 3 different operands on FMA instructions, and Intel had 4 operands.

Now, both companies have revised their specifications: AMD now has 4 operands on FMA instructions and Intel has 3 operands isgust;.

Apparently, Intel are to blame for not informing AMD in time about this change. They certainly knew that AMD planned to make compatible instructions because there have been patent sharing negotiations about this issue.

Maintaining compatibility seems to be a game of running after a moving target, as long as both companies keep secrets for each other rather than cooperate.

Can somebody from AMD please comment on how you will react to Intel's latest change in their FMA spec. Will future AMD processors use 3 or 4 operands on FMA instructions, or support both forms?

edward_yang · ‎05-02-2009

As far as I can see, semantic compatibility is moot because these instruction set extensions (from AMD and Intel) are not syntactically compatible anyway.

BTW I don't think you can patent an instruction format (i.e., encoding), but the implementation.

agner · ‎05-03-2009

Some of the new AMD instructions are different from anything Intel have, for example the half precision floating point calculations. Some instructions are identical in every way, for example PTEST. Some instructions serve the same need but are slightly different, for example AMDs VPCMOV and Intels BLEND instructions.

AMD changed the coding of their FMA instructions to make them fully compatible with Intels instructions. Unfortunately, Intel changed the specs for their FMA instructions so that compatibility is lost once again.

This situation is intolerable to the software community. We must find a way to standardize the x86 instruction set and force AMD and Intel to cooperate on instruction formats rather than playing tricks on each other for the sake of short term PR gains.

avk · ‎05-03-2009

Agner: It seems that AMD has heard you . But, alas, Intel did a dirty trick .

agner · ‎05-06-2009

Originally posted by: avk Agner: It seems that AMD has heard you . But, alas, Intel did a dirty trick .

Yes, AMD have certainly done what I have argued so heavily for here and elsewhere - thank you very much for that - but I have no idea whether they would have done the same had I not voiced my opinion. I don't know what the motives behind Intel's change was.

All these problems could be avoided if we had a public forum for discussion of new instructions, supported by both Intel and AMD.

Archives Discussions

AMD and Intel incompatible - What to do?