AnsweredAssumed Answered

Huge difference between the kernels execution time and program total execution time

Question asked by endoerner on Dec 12, 2013
Latest reply on Mar 6, 2014 by endoerner

Hi to everyone,


I am facing a weird problem with an OpenCL code that I am developing.


In short words, I have a certain number of kernels that are called inside a loop. The problem that I am facing is that the total time of the program is like two o three times the execution time of the kernels, measured with events profiling. For example, for a AMD FX8350 CPU + AMD Radeon HD7970 GPU (OpenCL running on the GPU) I obtained:


total time : 3275117 ms

kernel time : 1415446 ms


I use events for profiling, and when I remove the clWaitForEvents() function I obtain:


total time : 1857250 ms


I tried also removing from inside the loop the clSetKernelArg() functions, but the time gain was minimal. Also, there write/read  to/from the device is minimal, so it should not be the source of this problem (I have tested it) Anyways, that seems quite weird for me, as I have never seen such overhead from clWaitForEvents(). Moreover, if I run the program with an Intel CPU I obtain not such difference.


Any clue about this behavior? Thanks for your help in advance!