cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Dark_Shikari
Journeyman III

Merging behavior with setCC instructions?

The Phenom arch manual talks about how:

xor eax, eax

mov al, foo

is worse than

movzx al, foo

because of merging penalties.

But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

xor eax, eax

setne al

or should I do

setne al

movzx eax, al

The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.

0 Likes
3 Replies
edward_yang
Journeyman III

But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

xor eax, eax
setne al

or should I do

setne al
movzx eax, al

The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.



What are you trying to accomplish by these two instructions? Won't "xor eax, eax" set the ZF flag, and thus always make "setne al" to clear the byte?

0 Likes

Originally posted by: edward_yang
But what about setCC?  What kind of merging penalties exist for the setCC instructions, which can only output to 8-bit registers, and don't zero the high bits?  Should I do:

 

xor eax, eax setne al

 

or should I do

 

setne al movzx eax, al

 

The former is faster on all the Intel chips I've tested, since the xor can be executed well in advance, but I don't know how Phenom merging penalties affect this.



 

What are you trying to accomplish by these two instructions? Won't "xor eax, eax" set the ZF flag, and thus always make "setne al" to clear the byte?

 

Perhaps I should have made it clear that I was only posting a summary of the code, not the full code; obviously some comparison would happen between the xor and the set.

0 Likes

In that case the XOR instruction will be executed well before the instruction that uses the result of setne, so any false dependency on that should've been resolved. I.e., there shouldn't be any penalty.

0 Likes