Can someone explain something?

Discussion created by Morantex on Dec 3, 2009
Latest reply on Sep 14, 2010 by pdrongowski
Why is: add rax,rcx so costly?

I have seen the following in the Code Analyst breakdown of a small function, each line of machine code is executed the same number of times.

mov rax,[rdx]

mov rax,[rcx+rax+00000150h]

add rax,rcx


Now the first 'mov' line takes 4 CPU clocks, the second 'mov' line takes 2 CPU clocks, but the 'add' line takes 63 CPU clocks (this is what appears in the source code stats in Code Analyst).

I'm seeing this sort of unexpected disparity in many places as I profile a large API and test programs. Innocent looking machine instructions that appear to take far longer than similar ones nearby.

Is the displayed 'CPU clocks' reliable? (the 'hot' instructions don't seem to change so I guess they are).

This is in optimized x64 C code running under Vista x64 with an Athlon X2 6000+ and 8GB RAM and benchmarked using Visual Studio 2008 with the Code Analyst addin.

The profiling used is simply "Assess Performance".

Many of the 'hot functions' we are sooming in on, often end up having this kind of bizarre cause, isolated little instructions that seem to be consuming lots of cycles.

I'm no guru on the internals of the x64 processors or the timings of the x64 instruction set, but these numbers do look suspicious.

I'd appreciate any insights into what is going on here.