I know, the question may seem strange, but still: is it possible to make (and use) a kernel, which works in a closed-loop manner? I can foresee only problems with graphics here, but I can run it on my secondary GPU. What for? I just want to get rid of even smallest overhead from the kernel queuing. The kernel execution should be controlled by host over fine-grained SVM (and thru atomic operations from the inside) only. Is it possible, how do you think?