3 Replies Latest reply on Jun 11, 2009 9:36 AM by edward_yang

    Merging behavior with setCC instructions?

    DarkShikari

      The Phenom arch manual talks about how:

      xor eax, eax

      mov al, foo

      is worse than

      movzx al, foo

      because of merging penalties.

      But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

      xor eax, eax

      setne al

      or should I do

      setne al

      movzx eax, al

      The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.

        • Merging behavior with setCC instructions?
          edward_yang

           

          But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

          xor eax, eax
          setne al

          or should I do

          setne al
          movzx eax, al

          The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.



          What are you trying to accomplish by these two instructions? Won't "xor eax, eax" set the ZF flag, and thus always make "setne al" to clear the byte?

            • Merging behavior with setCC instructions?
              DarkShikari

               

              Originally posted by: edward_yang
              But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

               

              xor eax, eax setne al

               

              or should I do

               

              setne al movzx eax, al

               

              The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.



               

              What are you trying to accomplish by these two instructions? Won't "xor eax, eax" set the ZF flag, and thus always make "setne al" to clear the byte?

               

               

              Perhaps I should have made it clear that I was only posting a summary of the code, not the full code; obviously some comparison would happen between the xor and the set.