I want to run OpenCL in Multi-core CPU environment. Because OpenCL is architecture-independent, it should run and I succeeded in running.
However, when evaluating performance and checking performance element, I got trouble. I used CPU performance counter, but I cannot distinguish how long OpenCL runtime consumes and how long my own kernel consumes. I also cannot distinguish cache miss count.
Is there any method to have kernel's data only?