In static analyzer mode, CodeXL supports navigating through the ISA code to find the estimation for instruction cost in clock cycle. I'm not sure if this information helps you.
Currently CodeXL supports two levels of profiling only -1) API timeline trace and 2) kernel level performance counters.