I am wondering if there is a profiler for OpenCL on the AMD devices that supports line-by-line profiling? For CUDA, nvprof already has the PC sampling profiling option that gives per-line run time info; for OpenCL, right now I can only find VTune for Intel devices (although in vtune2018 this feature stopped working), I really want to optimize my opencl code on the AMD device, and a line-by-line profiler would be greatly helpful. any suggestions?
we have been using codexl, but it does not give line-by-line timing info, only gives some performance counters that I can not map to a particular line or segment of the code. is this still the case?
Currently CodeXL supports two levels of profiling only -1) API timeline trace and 2) kernel level performance counters. Line by line profiling is not supported, but most of the cases those profiling methods provide much information needed to analyze the performance bottleneck.
In static analyzer mode, CodeXL supports navigating through the ISA code to see the estimation for instruction cost in clock cycle. It also provides a good way to analyze the kernel code in detail, though some knowledge about the ISA is required for that.