AnsweredAssumed Answered

Large delay time before executing gpu kernels

Question asked by fv123 on May 26, 2014
Latest reply on Jun 3, 2014 by fv123



I have a program that needs to execute several kernels in successive order for about 10000 times.

Now the execution time of the kernels seems to be fine for me, but when i started profiling the events, it shows a large delay before the kernels are even executed.

Normally this wouldn't bother me too much, but as I need to do this many times, these "queue" times sum up and let the runtime explode. (The "wasted" time actually shows up in system time)


I attached an example code that shows how I handle the Enqueue calls and the profile routine I use.

Additionally an output file is attached that shows the (start-queued) time and (end-start) times in ms of my used kernels.

As you can see, the average "queued" time is longer than the execution time itself.


Note: openmp parameters aren't activated right now (NUM_THREAD_ID=0), therefore only one queue etc is used

2nd Note: The clWaitForEvents are only to assure the kernels have finished before the profile is made. Removing events and waits doesn't improve the wall time.


Is there anything I can do to the Enqueue calls or something else to improve these "queue" times effectively?