I tried time-based sampling a program that runs an OpenCL kernel on the CPU, hoping to get in the call graph granular info about were my hotspots may be in the kernel.
I'm not able to locate any part of my kernel in the call graph though.
Most samples are taken in modules: libamdocl64.so, libamdocl12cl64.so and OCL4371T1.so. I suspect that my kernel is being compiled into this latter module because of its name, that it's located under /tmp and its timer value. However, the symbols for these modules are marked as not loaded, and the resulting profiling data is of little use.
I tried disabling optimizations and enabling debugging information like this when building my program: