4 Replies Latest reply on May 23, 2011 6:21 AM by techforums123

    What does the Time counter in AMD APP Profiler list?


      states that the Time counter is

      "For a kernel dispatch operation: time spent executing the kernel in milliseconds (does not include the kernel setup time). For a buffer or image object operation, time spent transferring data in milliseconds."

      which makes me wonder whether "time spent executing the kernel" is wall clock time or CPU (GPU) time.

      It seems to be the CPU (GPU) time, the time during which kernel instructions are actively being processed, excluding any wait times on memory operations etc.

      Let me explain why I think this is the case:

      I profiled 2 different versions of my application, let's call them FAST and SLOW. Both have the same domain size, same number of work items, they differed in memory access and buffer entry sizes (float2 vs float3).

      SLOW had a low Time counter and a high FetchUnitStalled percentage.
      FAST had a high Time counter and a low FetchUnitStalled percentage.

      Still, FAST executed visibly faster than SLOW. This was also veryfied by measuring the wall clock time in the client application (including glFinish before and afterwards).
      If Time indeed lists the CPU (GPU) time then this would make sense, because even though that time is larger in FAST, different wavefronts might be executed while others wait for memory operations, therefore executing the whole kernel FAST faster. On the other hand in SLOW the compiler/runtime/driver might not decide to switch wavefronts and instead perform all those tiny waits, in total taking more time, because the memory access stalls are not hidden.

      Does Time really list CPU (GPU) time?

      What time does GPUTime in the description of FetchUnitStalled
      "The percentage of GPUTime the Fetch unit is stalled."
      refer to then?


        • What does the Time counter in AMD APP Profiler list?


          Nice investigation. But if go by defnitions mentioned in programming guide.

          Time: is the GPU time from the instant kernel was launched to the instant finished.So it should include all the fetch, ALU,Write times otherwise the other two definitions have no significance at all.

          ALUBusy:The percentage of GPUTime ALU instructions are processed. Value range: 0% (bad) to 100% (optimal).

          FetchUnitBusy:The percentage of GPUTime the Fetch unit is active. The result includes the stall time (FetchUnitStalled). This is measured with all extra fetches and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound).

          Similarly Write Unit Busy.

          Some of these operations may happen in parallel(like one wavefront stalled to write and execution happening on another). which reasonably explains the choice of using percentage w.r.t GPUTime for definitions. So now we can estimate how much does the write and fetch affect ALU and how much of it hidden.

          Please post the code you obtained your results. Also mention the system Details.



          These are my personal ideas and may not be 100% correct. Although I  try to give to best information.