I really would appreciate some hints on multi-task preempting on GPU.
Unfortunately, there is no premption mechanism.
The closest that you can get is to make break down your task T1 into many smaller sub-tasks (say, by breaking up your domain of execution), and launch those sub-tasks at appropriate intervals. Now you can insert T2 in between as needed.
Now, the GPU relies on a large domain of execution (many many threads) to achieve its speedups, so at finer levels of granularity, there is a trade-off between timing precision and speedup.
Why do you need preemption, btw?
Well the reason I'm interested in GPU premption is for CPU-GPU co processing optimization. Consider this loop :
(1) I have a main GPU task T1 that works on a data sample N.
(2) In the meantime, I have the CPU working on the previous result of GPU computation ie sample N-1.
Doing this way I can take full advantage of GPU and CPU working concurrently with only one synchro point - ie at the beginning of each loop.
Now the point is that while CPU is doing its own work, it may take advantage of short tasks T2 being run on the GPU - the processing involved for T2 being more suitabloe for GPU than for CPU.
The issue is that T1 is a batch and cannot really be split into pieces, should this be the case, I'd lose most of the benefits of concurrent GPU-CPU execution (since synchro schemes would have to be inserted).
So this is why having a form of preemption of GPU (on a different context for example) would be highly wishable.