AnsweredAssumed Answered

OpenCL: Delay in inter-kernel execution when requesting callbacks

Question asked by nfogh on Jan 21, 2019
Latest reply on Jan 24, 2019 by dipak


I have a problem with delays in kernel execution when I request callbacks from OpenCL.

In my application, I need to execute kernels at a "very" high rate (around 300Hz), and I need a callback to my host application every time execution has finished. However, I am seeing large delays in kernel-to-kernel execution when getting these callbacks, even when there is another kernel waiting in the queue.


To investigate, I have created a test program that enqueues 100 kernels into a queue. Looking at the CodeXL timeline trace, all the kernels are executed just after each other with around 1 - 3 us delay.

However, when I request a callback on one of the events, the subsequent kernel has a execution delay of around 0.25 ms. Even though the callback is completely empty (just a return statement).

The callback is executed is executed immediately (within i few us), but the next kernel waits to execute for some time. In the attached image, it can be seen when I request a callback for every 5th execution of my kernel:



I calculated that this delay can account for up to 25% of my GPU processing power when executing at 300 Hz


Can anybody shed some light on this?