I'm interested in analyzing OpenCL kernels within the compiler; in particular, I'd like to count a variety of types of operations (i.e. number of floating-point adds, number of loads, etc.). I can do this easily in GCC by instrumenting the code ("-fprofile-generate"), running the instrumented code and loading the resulting basic block counts into the compiler ("-fprofile-use"). Any passes I add to GCC at this point can see how many times each basic block was executed, and in turn, how many times each instruction was executed.
I'm having a hard time figuring out how to do this in OpenCL, and in particular with LLVM (which the APP SDK uses as the CPU compiler/assembler). Does anybody have any ideas of how I could do this? It seems like LLVM is tightly integrated into the framework - is there any way to switch out LLVM for GCC?
Note: I realize that OpenCL code is inherently parallel, but I can use OpenCL sub-devices to run the kernel on a single core to accurately obtain basic block counts