I've got a question regarding the Application timeline Profiling in AMD CodeXL. I have an c++ console application running on windows which does some calculations using OpenCL. To measure the time taken on the host, I use following (pseudo-) code:
auto start = std::chrono::steady_clock::now(); /* Copy from host to device, enqueue kernels, etc. */ queue.finish(); auto duration = std::chrono::steady_clock::now() - start;
This works as expected when running the application via command line (c++ code is compiled in release mode). I use large problem sizes and run the kernels multiple times and just take the average to reduce noise. When I run the application via AMD CodeXL the measured time roughly takes 1/8th of the previous measured cmd time. Does CodeXL somehow configure the OpenCL compiler with any additional compiler flags which might influence performance?
I've tried passing various flags to my application embedded OpenCL compilation API call (mad-enable, O5, etc.) they however did not increase performance.
Thanks for any hints in advance.