I am trying to implement an problem and need write many kernels for it( 6 or more). And I found the time cost between two kernels is very large. e.g. after first kernel finish execute, there will be about 2 ms till next kernel begin execute. Is there any technique to reduce the time cost between the two kernel execution?