I am trying to implement an problem and need write many kernels for it( 6 or more). And I found the time cost between two kernels is very large. e.g. after first kernel finish execute, there will be about 2 ms till next kernel begin execute. Is there any technique to reduce the time cost between the two kernel execution?
do you enqueue multiple kernels and then call clFinish?
for my multip kernels, the later kernel need the result from previous kernel. That means after first kernel finished, CPU need read data and do some calculation, then set the new kernel input params, and the next kernel begin to execute. Can I engueue multiple kernels in this case?
well try to batch as many command as you can. if you have in order queue then previous kernel execution/read write is always finished before next command is proceed.
i dont know your program but you can enqueue multiple kernel execution and readback at once. you just allocate multiple memory location for readbacks. of course this is possible only when further kernel execution is not depend on that CPU calculations.
if not you can do something like this. kernel1, kernel2, read, kernel1, kernel2, read, ... , kernel1, kernel2, read, clFinish
It might make more sense to enqueue a CPU kernel to do the CPU-side work, or call your CPU function from a native kernel. That way it can be enqueued in the OpenCL queuing system and any scheduling (and batching) will be managed by the runtime. The runtime will then handle data movement as necessary.