0 Replies Latest reply on Sep 18, 2015 2:20 AM by maxdz8

    [Suggestion] Use "decorated" kernel names in "application timeline trace"

    maxdz8

      I use #defined values (through -D option) to have some of my kernels switch behavior.

      The profiler keeps track of those kernels by decorating them by order of compilation and device, so I have for example

       

      EntryPointName__k10_Capeverde1

      EntryPointName__k11_Capeverde1

       

      That's super cool.

       

      However, for some reason "Kernel summary" uses kernel entrypoint name from the source.

      This happens even though the application timeline trace knows something is different. For example, in my case the two kernels have the following timing and occupancy characteristics (taken from the "Host thread" timeline)

       

      Call
      Device timeOccupancy% VGPR/SGPR
      1st69250% 46/35
      2nd2.1050% 46/32

       

      So the application timeline trace is somehow tracking the fact those kernels are not the same thing.

      Yet in kernel summary, those two are mangled together. All the timings are therefore screwed big way.

      I would like it to be extended to work considering the various compiled objects instead of the source entrypoint, just as other sections involving GPU profiling are doing.

       

      OR: is it possible to somehow switch this feature on / workaround this behavior?