I have been utilizing the clGetEventProfilingInfo function within my Host code to get timing information related to kernels. I'm experiencing some sort of bottleneck within my code and I'd like to investigate it further. Ideally, I would like to put something within my OpenCL code that would allow me to measure the time it takes (with micro or nanosecond precision) to execute various parts of my kernel.
I am currently testing a CPU implementation. I have utilized a timer.h file in the past for doing something similar within my C++ code, but this is a C++ based file and I don't believe I can use it within my OpenCL code. Does anyone have any suggestions as to how I can time various portions of my kernel code (not the whole kernel itself)?