cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

agner
Adept I

AMD and Intel incompatible - What to do?

Originally posted by: yeyang

Self-quoting doesn't make what you said a bit more credible.



I expected my readers to be sufficiently computer-literate to be able to click on a link so I didn't have to write the same long list of arguments again. Please follow the link and read the discussion thread on Aces hardware forum. Here is the link again:

http://aceshardware.freeforums.org/intel-avx-kills-amd-sse5-t538.html

0 Likes
yeyang
Journeyman III

AMD and Intel incompatible - What to do?

The thing is your arguments there are not correct and I see no reason that anyone should take your words for granted. Besides, isn't copy-and-paste some basic computer skill? It's so easy and I'll do it for you here. I'll even comment on them in-line.

When AMD published their new ISA extension named SSE5 in late August 2007, they also introduced a new instruction code format for instructions with 3 or 4 operands. When Intel presented their AVX extension in April this year they introduced another code format that also supports 3 or 4 operands. These two formats are very different. We are now in a position where AMD and Intel are using completely different coding schemes for the same instructions.


Actually it is Intel using completely different coding scheme for AVX, which with the new scheme completely replaces the instruction formats of SSE - SSE4.2. In contrast AMD's SSE5 simply adds a DREX to the instructions and leaves the format of SSE - SSE4a intact.

This is every programmer's nightmare! I cannot imagine any significant number of programmers making three versions of their code: one for AMD, one for Intel, and one for compatibility with older processors.


In any case this is not programmer's problem because compilers should take care of that.

The forking of instruction sets and coding schemes is one of the less desirable consequences of free competition. We would all prefer some kind of international standardization committee that could approve new instruction codes. Such a committee would be reluctant to accept new shortsighted patches that add just another complication to instruction decoding. They would have weeded out the bizarre undocumented instructions from the old 8086 days that are still supported. And they might not accept the addition of new instructions to the already bulging instruction set mainly for marketing reasons with little technical benefit. Unfortunately, there is little hope that such a committee will be formed.


You first say we should do something, then say that something has little hope. I don't know what's your logic or what are you talking about???

I have looked into the details of the two competing instruction formats and made a comparison:

* Both ISA extensions are compatible with all existing code.

* SSE5 supports 3 operands for new instructions only. AVX extends existing instructions to 3 operands as well. Almost all existing instructions on XMM registers are extended to 3 operands, and the code format makes room for also extending general-purpose register instructions to 3 operands.

* SSE5 supports instructions with 4 operands, but only if two of the operands are the same register. AVX supports any combination of 4 registers by adding an extra code byte. Future extension to 5 operands is possible.


I see no reason that later SSE5.x can't add support for 3-operand or 4-operand to other SSE media instructions by using the immediate byte like AVX does. The only reason that it doesn't seems to be that current SSE5 attempts to make minimum changes to existing instruction formats.

* SSE5 makes instructions longer. AVX makes some instructions longer and some instructions shorter, but most instructions keep the same length as before despite containing one more register operand and other new information.


SSE5 does not make instructions longer. SSE5 instructions are as long as SSE3 instructions (2-byte prefix plus 1-byte opcode). AVX makes some SSE3/SSE4 instructions 1-byte shorter under 64-bit mode because it absorbs the functionality of REX. In all other cases the AVX format makes instructions longer.

* SSE5 adds yet another complication to the already very complicated instruction decoding procedure. AVX makes instruction decoding simpler by sanitizing a lot of old patches. The many prefixes and escape bytes that pester the current instruction set are joined together into a single "VEX" prefix that is 2 or 3 bytes long.


This is completely false. SSE5 is so simple that it can be described in 2 pages. AVX is so complex that the even smalls things like register names and formats are different for different operating modes or instructions. Some AVX instructions can use both 2 and 3 byte VEX, some can use only 3 byte VEX. Sometimes a register in AVX is taken 1's compliment and sometimes it is not. Sometimes at some position AVX can specify a memory argument and sometimes it cannot. Sorry but there is no "sanitation" at all but deliberate complication.

* AVX supports the extension of the 128-bit vector registers (XMM registers) to 256 bits (YMM registers) with room for further extensions in the future. SSE5 has no room for new extensions.

* AVX has 3 unused bits for future extensions to the now overloaded opcode map. This means no new shortsighted patches for a foreseeable future.


I don't see why SSE5 can't have room for further extension. There are unused opcodes and unused prefix available. If AMD so wanted they can even recycle opcodes from 3DNow for SSE5.x. Besides, the merit of an ISA extension is not in how much it can be extended, but how useful its extensions are.

Before I saw the AVX documentation, I would have denied that it was possible to add so much new information without making instructions longer. The trick is that it makes one long prefix instead of many short prefixes. One or a few bits in the new VEX prefix contains the same information as a whole 8-bit or even 16-bit prefix or escape code in the current coding scheme. The two VEX prefixes are made out of two obsolete instructions, LDS and LES, which are valid in 16- and 32-bit mode but invalid in 64-bit mode. Certain bits in the VEX prefix that indicate register extensions available only in 64-bit mode are placed in such a way in the VEX prefix that the only values valid in 32-bit mode form an invalid register operand if interpreted as a legacy LDS or LES instruction. This is a solution no less ingenious than the x64 extension invented by AMD.


VEX is basically a way for Intel to say "sorry we messed up the instruction from MMX to SSE to SSE4.1 and SSE4.2, now we're going to fix them up by messing up the instruction format a bit more." The problem is not how the instructions are cramped into a 3-byte word, but how the instructions have overlapping and specialized functionality among them.

To add insult to injury (from the prospect of ISA quality), there is no "different formats of identical instructions" in AVX and SSE5 (because instructions in these two are different), but there are different formats of identical instructions in AVX itself alone. "Ingenious," indeed. Like AMD64, NO.

Looking at the advantages of AVX over SSE5 there can be no doubt that AMD has no choice but to adopt AVX. There is no way AMD can stay in competition without supporting the new 256-bit vectors and the 3-operand version of all existing XMM instructions. And, incidentally, it will be easier to implement the new 3-operand instructions for AMD than it is for Intel because the current Intel microarchitecture does not allow micro-operations with more than two inputs, while the AMD microarchitecture has no such limitation.


The only "advantage" of AVX over SSE5 is 256-bit registers and 4-operand operations. However, AVX also has less powerful compare/permutation instructions, but more semantic and syntactic restrictions on the use of its instructions than SSE5. Furthermore, it will probably be easier for AMD to implement SSE5 than for Intel to implement AVX. I seriously doubt that Intel made AVX so complicated to ensure that nobody (else) can implement it easily. A good SSE5 and SSEplus implementation will be easier to use and cheaper to implement than AVX.

Let me explain the advantage of 3-operand instructions to those who don't know what this is about. Most of the current instructions place the result of a calculation in the same register as one of the input operands, e.g.:
A = A * B.
With a 3-operand version, you can do:
C = A * B.
This gives the programmer the freedom to reuse the original value of A in other calculations without having to copy it to another register. The result is fewer register-to-register moves and hence more efficient and compact code.


The SSE5 instructions will suffer the same fate as AMD's 3DNow instructions. Nobody ever used the 3DNow instructions because they are not supported in Intel processors. They are superseded by the more efficient SSE instructions, but AMD have to keep supporting them in all their future processors for the sake of backwards compatibility. Let's hope that AMD have the guts to drop SSE5 altogether before it's too late. There has been some speculation that they might.

Too bad that AMD haven't seen this coming before they published their SSE5 spec. Intel must have been able to keep their plans secret despite the patent sharing agreement between AMD and Intel. Maybe there is no patent on AVX?


I don't think you understand the difference between SSE5 and AVX, and that between SSE5 and 3DNow. Had you actually studied it and understood it, you'd have found that SSE5 instructions are very generic. They are applicable to a wide range of situations. AVX instructions OTOH are quite specific and has operation requirements that make you wonder where did those come from. In terms of ISA design SSE5 is clearly better than AVX.

As for 3DNow, it was not comparable to the depth or width of SSE5 at all. There were less than 20 instructions in 3DNow and they are all focused on floating point operations. In contrast SSE5 offers 80 some new instructions (not including different operating modes of the same opcode) covering all kinds generic operations. The number of new instructions offered by SSE5 is even more than that offered by AVX. Your "claim" has been totally contrary to reality, and thus I don't believe your "prediction" bears any credibility either.
0 Likes
agner
Adept I

AMD and Intel incompatible - What to do?

I am not talking about which instructions are more useful, I am talking about the way they are encoded. The AVX scheme has plenty of room for future extensions without making instructions longer: The L bit makes room for YMM registers, the three unused mmm bits make space for new opcodes, the two unused pp bits make space for other operand types or still larger registers or whatever, the four unused bits in the immediate byte allow for a fifth operand. The DREX scheme has none of this. The AVX scheme makes non-destructive forms of all XMM instructions. The SSE5 instructions do not have non-destructive forms because two of the operands must be the same register.

0 Likes
avk
Adept III

AMD and Intel incompatible - What to do?

The only way to persuade a big bad boy Intel is to do something like what AMD did with its own x86-64. I think that it will be reasonably to develop a RISC-like ISA (let's name it, say, RISC86 ), compatible with x86-64 at instruction mnemonics, but not compatible at the binary code. If all the general players on the x86-market (both software and hardware ones: Microsoft, GNU/Linux creators, AMD, VIA/Centaur, nVidia, quite not sure about Intel) will support this new RISC86 ISA by creating a public standardization committee, it woud be an ideal situation. In this case many problems can be solved.
0 Likes
agner
Adept I

AMD and Intel incompatible - What to do?

A new RISC instruction set? Intel did that first with Itanium - not a big success   The market wants compatibility, not on the assembly language level but on the binary level. A new ISA will have to be compatible with existing code. A microprocessor that supports both x86 and RISC86 would probably be accepted by the market, but it might end up running x86 code faster than RISC86 because the CISC code takes less space in the code cache than RISC code.

0 Likes
avk
Adept III

AMD and Intel incompatible - What to do?

Well, speaking of RISC86 I have meant just another new operating mode plus to three already existing: x86-16, x86-32, x86-64... and RISC86 . All of us know that every of three existing x86-modes (16, 32, 64 bits) has its own decode rules (I mean a binary code), so why not to start from the scratch and eliminate (almost?) all the x86-problems in the fourth new RISC86-mode?:

1) a different length of instructions;
2) an ugly x87 instructions;
3) an ugly instruction encoding with lots of escape prefices;
4) a brake a.k.a rFLAGS which is significally slows down instruction execution;
5) no more than two operands for the most instructions;
6) what else ?

About Itanium: I would like to say that Intel, this BBB (big bad boy), does want a lots of money for its IA64, but AMD doesn't for its x86-64 at all. That's the huge difference. Did you watch the Guy Ritchie's "Revolver" movie? If yes, do you remember what Lord John did say?: "But there is no angel as destructive as their greed, in the end she gets them all." I would like to repeat another phrase (not from the movie 😞 "World needs open standards." Do you remember which are they ?

About instruction cache capacity: I'm not quiet sure that, in general, CISC code will be much more compact than RISC86 one. Maybe slightly.
0 Likes
agner
Adept I

AMD and Intel incompatible - What to do?

I don't see the advantage of a new ISA encoding scheme. If you look at the disassembly of a typical program you will see that > 90% of the instructions are very short instructions like push, pop, call, ret, mov, inc,  add, cmp, conditional jump, etc. If you make a RISC code for the same instructions, you would need 8 bytes for each instruction. This would increase the code size by approximately a factor 3. And you would be unable to code instructions that have a 64-bit immediate, such as MOV RAX,big_constant. You will see the capacity of your code cache reduced by a factor 3 and hardly any gain in decoding speed. The VEX system simplifies decoding by allowing only certain specific instruction lengths. This is a reasonable compromise between RISC and CISC in my opinion.

I agree that there are a lot of things that could be cleaned up and sanitized if a new mode is introduced for whatever reason. AMD removed the most obviously obsolete instructions when they designed the x64 mode, but there is much more that could be cleaned up. Microsoft tried to ban the old x87 registers in x64 Windows, according to the first preliminary specs, but for some reason they changed their mind and supported x87 in Win64.

Things that could be cleaned up if a new mode is defined:

  • Get rid of x87 and mmx
  • Most of the instructions that write to partial registers, e.g. MOV AX,BX; SETE AL; or partial flags, e.g. INC; should clear the rest of the register or flags to remove false dependences.
  • The single-byte short form of XCHG instructions should be replaced by the two-byte general form because they are very rarely used. I have never seen XCHG instructions in compiler-generated code. This would free the bytes 0x91-0x97 in the opcode map. If these bytes were reclaimed as VEX prefixes, we would have three more opcode bits without making instructions longer.
  • Win64 ABI could use a revision. Linux64 ABI is more efficient.

However, these advantages are not sufficient for justifying yet another CPU mode and yet another software standard. But if a new mode is introduced for some other reason then these changes should be included.

0 Likes
avk
Adept III

AMD and Intel incompatible - What to do?

Yes, it is well known that more than 90% of x86-instructions in most applications has very short forms, but I don't think that their RISC86 equivalents will need 8 bytes. I think 4 bytes will be enough. If RISC86 will provide more than 2 operands for most instructions, this will reduce the quantity of instructions to do the same work, so a size of code will shorten too.

About VEX: it is a palliative, IMHO.

About instructions with immediate operands: although the minimum size of RISC86-instruction will be 4 bytes, but I don't meant that it will be the maximum. Yes, it is not a plain RISC, but who care? Why not to occupy just 2 bits in the 32-bit RISC86-instruction to define its full length (2 pow 2 = 4 values: 0 = 1 dword, 1 = 2 dwords, 2 = 3 dwords, 3 = 4 dwords)? Yes, instructions will have a various lengths, but using this two bits it will be very simply to recognize it. So, an immediate and/or a displacement will lie in the next 32/64-bits cells after instruction itself. I must admit that I'm not quiet sure for need of these 2 bits - maybe CPU architects will find another method to calculate instruction length.
0 Likes
Pmoll
Journeyman III

AMD and Intel incompatible - What to do?

guys need some help..

 

Install x86_64 on AMD problem

Just built new computer with AMD Athlon XP 2600+Barton on Asus A7N8X-X-UAY motherboard, 512 MB of RAM, and an 80 GB HD. Want to have a dual boot system so I first installed W98. No problems. Used FIPS to shrink windows and created a new partition for Linux. All went well. Downloaded Fedora Core 1 (Yarrow x86_64) ISO and burned onto CD. Booted from CD to install Fedora and got message, "Your CPU does not support long mode. Use a 32 bit distribution." Install will not move forward. Should I install i386? Please help.

 

 

0 Likes
avk
Adept III

AMD and Intel incompatible - What to do?

That's right, Athlon XP does not support long (64-bit) mode. Therefore, you should to install i386 (32-bit) version of software. If you interested in what exactly your CPU capable of, then you can run CPU-Z utility.

0 Likes