eduardoschardong

SSE5 and AMD64 extensions

Discussion created by eduardoschardong on Oct 28, 2007
Latest reply on Dec 17, 2007 by joseph1110
Sugestions and comments

Hello,

I have read some comments on SSE5 over the internet and feel that most programers didnt see a big advantage on it, too few new instructions.
Personally i think there are few useful new instructions, with the main advantage beeing the new encoding (DREX), but it is beeing under-utilized, why not extend it for all instructions, at least on long mode?
For example, there are few opcodes that become invalid on long mode, those opcodes could be used as prefix for the new encoding instead of using three opcodes instrucitons:

- for SSE instructions with and without the 66h prefix a 61h could be added to indicate the new encoding, replacing the 0Fh, DREX.OP0 could be used to indicate the use or not of the 66h prefix:
[code]
ADDPS xmm1, xmm2 0F 58 /r
ADDPS xmm1, xmm2, xmm3 61 58 DREX.0 /r

ADDPD xmm1, xmm2 66 0F 58 /r
ADDPD xmm1, xmm2, xmm3 61 58 DREX.1 /r
[/code]

- for SSE instructions with the 0F2h or 0F3h prefix a 62h could be added to indicate the new encoding, replacing the 0F2h 0Fh, DREX.OP0 could be used to indicate the use of 0F2h or 0F3h:
[code]
ADDSS xmm1, xmm2 F3 0F 58 /r
ADDSS xmm1, xmm2, xmm3 62 58 DREX.0 /r

ADDSD xmm1, xmm2 F2 0F 58 /r
ADDSD xmm1, xmm2, xmm3 62 58 DREX.1 /r
[/code]

- for the normal instructions a 60hcould be added to indicate the new encoding, the DREX.OP0 could be used to indicate the use of the 66h prefix, alternating between 16 and 32 bits length, the use of DREX.OP0 with an instruction with 8 bits length would extend it to 64 bits:
[code]
ADD al, dl 00 /r
ADD al, cl, dl 60 00 DREX.0 /r

ADD ax, dx 66 01 /r
ADD ax, cx, dx 60 01 DREX.1 /r

ADD eax, edx 01 /r
ADD eax, ecx, edx 60 01 DREX.0 /r

ADD rax, rdx REX.W 01 /r
ADD rax, rcx, rdx 60 00 DREX.1 /r
[/code]

There are some instructions wich this encoding won't work or does not make sense, so there are space for future extensions
This change would allow all instructions to have 3 operands and access all 16 registers with just 4 bytes improving code density and reducing the number of instructions needed, this would be a bigger gain for SSE code, but still usefull for normal code.

Comments are welcome, thank you for reading the entire post and sorry for my bad english...

Outcomes