I use #defined values (through -D option) to have some of my kernels switch behavior.
The profiler keeps track of those kernels by decorating them by order of compilation and device, so I have for example
That's super cool.
However, for some reason "Kernel summary" uses kernel entrypoint name from the source.
This happens even though the application timeline trace knows something is different. For example, in my case the two kernels have the following timing and occupancy characteristics (taken from the "Host thread" timeline)
|Call||Device time||Occupancy% VGPR/SGPR|
So the application timeline trace is somehow tracking the fact those kernels are not the same thing.
Yet in kernel summary, those two are mangled together. All the timings are therefore screwed big way.
I would like it to be extended to work considering the various compiled objects instead of the source entrypoint, just as other sections involving GPU profiling are doing.
OR: is it possible to somehow switch this feature on / workaround this behavior?