1 Reply Latest reply on Nov 27, 2012 11:31 PM by kalyanpk

    confusing TBF hotspot on a simple "test ecx,ecx" instruction



      I have a hot loop containing c++ "if (thing[index].start) {}". Simple stuff, taking an index into an array of 8 byte structures. Sampled with TBF at 0.1 ms.


      Because it was so hot, I recoded it in various ways to try to improve it. In each case, I still get a hotspot on the actual "test ecx,ecx" instruction. This instruction will be executed a lot, but so will others in my loop. So I don't understand what TBF is telling me. Either the TEST instruction is very slow, or is particularly magnetic for samples, or CA is not working.


      here is my latest attempt. I tried to take advantage of the structure size being 8, by recoding the lookup as "base + (index << 3)". This bit is fast, but still the test ecx,ecx kills me.

           f = thing + (((int)index) << 3);                       2.41  
      movzx eax,ax                               0F B7 C0             
      shl eax,03h                                C1 E0 03        0
      movsxd rcx,eax                             48 63 C8        2.36  
      add rcx,[rbp-08h]                          48 03 4D F8     0.05  
           int start = *(uint32_t*) f                             0.18  
      movsxd rax,[rcx]                           48 63 01        0.18  
           if (start)                                             33.15 
      test eax,eax                               85 C0           33.15 
      jz $+00000102h (0x13fb85d0e)               0F 84 FC 00 00 00    


      Now, the movsxd and add are executed exactly the same number of times as the test. So why is the test showing as 33%, but the add as 0.05 ?


      Does this look plausible? Are the asm instruction level stats from TBF reliable at all? I had 250K samples in total. 30K were in my prog. Of these 10K were on this exact test ecx,ecx instruction. I know that TBF is a stats based estimate, but this test instruction seems to have a target painted on its back.



      Here is my sysinfo, sorry about the processor


      (it is a 64bit exe)


      Vendor: Intel, Family: Pentium III Processor

      Family 6 Model 10 Stepping 7

      Approximate core frequency is 2195 MHz

      Onboard local APIC detected.

      8 Processors Installed

      Computer Name:

      Windows 7 Version 6.1 Build 7601

      Service Pack 1


      Percentage of Memory is in use: 25

      Total physical memory: 8344472 KB

      Free physical memory: 6228544 KB

      Total size of paging file : 16687092 KB

      Free size of paging file: 14477820 KB

      Total virtual memory: 8589934464 KB

      Free virtual memory: 8589570212 KB


      Pcore.sys: 1.0.42

      CAProf.sys: 3.0.3

      dbghelp.dll: 6.8.4

      symsrv.dll: 6.12.2

      QtCore4.dll: 4.3.0

      QtGui4.dll: 4.3.0

      QtXml4.dll: 4.3.0

      CACommon.dll: 3.8.1203

      CADataTranslation.dll: 3.8.1203

      CAK86Disasm.dll: 3.8.1203

      CAProfileControl.dll: 3.8.1203

      CAProfileDataAccess.dll: 3.8.1203

      CAWinTaskInfo.dll: 3.8.1203

      CSSQuery.dll: 0.0.0

      CSSTranslator.dll: 3.8.1203

      DCConfig.dll: 3.8.1203

      LibCAData.dll: 3.7.1202

      ModAnalyzer.dll: 3.8.1203

      ViewConfig.dll: 3.8.1203

      WinSymbolEngine.dll: 3.8.1203


        • Re: confusing TBF hotspot on a simple "test ecx,ecx" instruction


          Time based profiling is used to identify the hot-spots in your code.

          As CodeAnalyst uses statistical sampling, the samples attributed to the asm in time based profiling is not accurate.

          This is expected behavior with time based profiling.


          You need to be using "Instruction Based Sampling" (IBS), so that you can tag to instructions.

          By using IBS, the samples are accurately attributed to instructions.


          IBS is specific to AMD family processors ( Processor family 10h or above).

          As you are using Intel platform, only time based profiling is available.



          Kalyan P