AnsweredAssumed Answered

confusing TBF hotspot on a simple "test ecx,ecx" instruction

Question asked by smitchellserena on Nov 27, 2012
Latest reply on Nov 27, 2012 by kalyanpk

Hello,

I have a hot loop containing c++ "if (thing[index].start) {}". Simple stuff, taking an index into an array of 8 byte structures. Sampled with TBF at 0.1 ms.

 

Because it was so hot, I recoded it in various ways to try to improve it. In each case, I still get a hotspot on the actual "test ecx,ecx" instruction. This instruction will be executed a lot, but so will others in my loop. So I don't understand what TBF is telling me. Either the TEST instruction is very slow, or is particularly magnetic for samples, or CA is not working.

 

here is my latest attempt. I tried to take advantage of the structure size being 8, by recoding the lookup as "base + (index << 3)". This bit is fast, but still the test ecx,ecx kills me.

     f = thing + (((int)index) << 3);                       2.41  
movzx eax,ax                               0F B7 C0             
shl eax,03h                                C1 E0 03        0
movsxd rcx,eax                             48 63 C8        2.36  
add rcx,[rbp-08h]                          48 03 4D F8     0.05  
     int start = *(uint32_t*) f                             0.18  
movsxd rax,[rcx]                           48 63 01        0.18  
     if (start)                                             33.15 
test eax,eax                               85 C0           33.15 
jz $+00000102h (0x13fb85d0e)               0F 84 FC 00 00 00    

 

Now, the movsxd and add are executed exactly the same number of times as the test. So why is the test showing as 33%, but the add as 0.05 ?

 

Does this look plausible? Are the asm instruction level stats from TBF reliable at all? I had 250K samples in total. 30K were in my prog. Of these 10K were on this exact test ecx,ecx instruction. I know that TBF is a stats based estimate, but this test instruction seems to have a target painted on its back.

 

------------------------------------------------------------

Here is my sysinfo, sorry about the processor

 

(it is a 64bit exe)

 

Vendor: Intel, Family: Pentium III Processor

Family 6 Model 10 Stepping 7

Approximate core frequency is 2195 MHz

Onboard local APIC detected.

8 Processors Installed

Computer Name:

Windows 7 Version 6.1 Build 7601

Service Pack 1

Memory:

Percentage of Memory is in use: 25

Total physical memory: 8344472 KB

Free physical memory: 6228544 KB

Total size of paging file : 16687092 KB

Free size of paging file: 14477820 KB

Total virtual memory: 8589934464 KB

Free virtual memory: 8589570212 KB

 

Pcore.sys: 1.0.42

CAProf.sys: 3.0.3

dbghelp.dll: 6.8.4

symsrv.dll: 6.12.2

QtCore4.dll: 4.3.0

QtGui4.dll: 4.3.0

QtXml4.dll: 4.3.0

CACommon.dll: 3.8.1203

CADataTranslation.dll: 3.8.1203

CAK86Disasm.dll: 3.8.1203

CAProfileControl.dll: 3.8.1203

CAProfileDataAccess.dll: 3.8.1203

CAWinTaskInfo.dll: 3.8.1203

CSSQuery.dll: 0.0.0

CSSTranslator.dll: 3.8.1203

DCConfig.dll: 3.8.1203

LibCAData.dll: 3.7.1202

ModAnalyzer.dll: 3.8.1203

ViewConfig.dll: 3.8.1203

WinSymbolEngine.dll: 3.8.1203

 

Outcomes