I'm interested in analyzing OpenCL kernels within the compiler; in particular, I'd like to count a variety of types of operations (i.e. number of floating-point adds, number of loads, etc.). I can do this easily in GCC by instrumenting the code ("-fprofile-generate"), running the instrumented code and loading the resulting basic block counts into the compiler ("-fprofile-use"). Any passes I add to GCC at this point can see how many times each basic block was executed, and in turn, how many times each instruction was executed.
I'm having a hard time figuring out how to do this in OpenCL, and in particular with LLVM (which the APP SDK uses as the CPU compiler/assembler). Does anybody have any ideas of how I could do this? It seems like LLVM is tightly integrated into the framework - is there any way to switch out LLVM for GCC?
Note: I realize that OpenCL code is inherently parallel, but I can use OpenCL sub-devices to run the kernel on a single core to accurately obtain basic block counts
I assume that your goal is to optimize the OpenCL device kernel. You can not use gcc to analyze the kernel. However, the CodeXL profiler gives you various counters that are very useful in optimization. Please go through the AMD APP programming guide for more information.