3 Replies Latest reply on Dec 17, 2007 12:33 PM by joseph1110

    SSE5 and AMD64 extensions

    eduardoschardong
      Sugestions and comments

      Hello,

      I have read some comments on SSE5 over the internet and feel that most programers didnt see a big advantage on it, too few new instructions.
      Personally i think there are few useful new instructions, with the main advantage beeing the new encoding (DREX), but it is beeing under-utilized, why not extend it for all instructions, at least on long mode?
      For example, there are few opcodes that become invalid on long mode, those opcodes could be used as prefix for the new encoding instead of using three opcodes instrucitons:

      - for SSE instructions with and without the 66h prefix a 61h could be added to indicate the new encoding, replacing the 0Fh, DREX.OP0 could be used to indicate the use or not of the 66h prefix:
      [code]
      ADDPS xmm1, xmm2 0F 58 /r
      ADDPS xmm1, xmm2, xmm3 61 58 DREX.0 /r

      ADDPD xmm1, xmm2 66 0F 58 /r
      ADDPD xmm1, xmm2, xmm3 61 58 DREX.1 /r
      [/code]

      - for SSE instructions with the 0F2h or 0F3h prefix a 62h could be added to indicate the new encoding, replacing the 0F2h 0Fh, DREX.OP0 could be used to indicate the use of 0F2h or 0F3h:
      [code]
      ADDSS xmm1, xmm2 F3 0F 58 /r
      ADDSS xmm1, xmm2, xmm3 62 58 DREX.0 /r

      ADDSD xmm1, xmm2 F2 0F 58 /r
      ADDSD xmm1, xmm2, xmm3 62 58 DREX.1 /r
      [/code]

      - for the normal instructions a 60hcould be added to indicate the new encoding, the DREX.OP0 could be used to indicate the use of the 66h prefix, alternating between 16 and 32 bits length, the use of DREX.OP0 with an instruction with 8 bits length would extend it to 64 bits:
      [code]
      ADD al, dl 00 /r
      ADD al, cl, dl 60 00 DREX.0 /r

      ADD ax, dx 66 01 /r
      ADD ax, cx, dx 60 01 DREX.1 /r

      ADD eax, edx 01 /r
      ADD eax, ecx, edx 60 01 DREX.0 /r

      ADD rax, rdx REX.W 01 /r
      ADD rax, rcx, rdx 60 00 DREX.1 /r
      [/code]

      There are some instructions wich this encoding won't work or does not make sense, so there are space for future extensions
      This change would allow all instructions to have 3 operands and access all 16 registers with just 4 bytes improving code density and reducing the number of instructions needed, this would be a bigger gain for SSE code, but still usefull for normal code.

      Comments are welcome, thank you for reading the entire post and sorry for my bad english...
        • SSE5 and AMD64 extensions
          m210658
          Hello Eduardo,

          this is a very interesting suggestion!

          Yes, we have considered adding 3 operand versions of some of the "standard" SSE instructions to the instruction set. We decided against it, at this time, as there is some cost (instruction decoder, verification and not the least - enablement in the SW tool chain) involved.
          There is also some risk in reusing existing opcodes - even more so if they introduce a mode dependency - and a more likely path of future expansion will be using the new opcode space at 0F 24/25/....

          Even though, I have to admit, your proposal looks very elegant!

          Regards,

          Michael Frank
          AMD Fellow, Architecture Extensions Group


          This response is provided for informational purposes only, is provided "AS IS" and does not obligate AMD to provide any of the services, technology, or programs described.
          • SSE5 and AMD64 extensions
            evxxvi
            The killer advantage for me is popcnt! I really need that stuff and can hardly wait for my next-gen opteron!
            • SSE5 and AMD64 extensions
              joseph1110
              Hi Michael,


              I currently use an AMD Athlon 64 X2 Dual-Core Processor 5600+ based PC under Red Hat Linux Fedora 7 and
              do Math., et caetera, as a hobby.
              Will a Phenom/Barcelona FPU give me ~ 34 - NOT 19 - digits of floating-point precision for "long double"
              C/C++ variables, please?
              Could you run a simple test using GCC for me which I could supply, please?!