Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

GPU task order of completion for different contexts

preempting GPU


I understand that within one context the order of kernel completion is based on a Fifo scheme, ie the same order as request filed in the command queue.

Considering 2 contexts on the same device:

(a) Assuming for context 1, a batch of kernel computations has been launched through filing up queue 1 with call for kernels k1, k2 and k3 and then issuing a flush.
Assuming this task T1 (= k1+k2+k3) execution will take say 1s to complete

(b) Concurrently I have another task (T2) comprising kernel k4 that typically would take say 1ms of execution, and I would like this task to be completed quickly (ie no wait for T1 completion).


(1) is there a way while T1 is getting executed to preempt the GPU, by sending a call for T2 in context 2?

(2) in this case is it possible to assume that T2 could be completed prior to T1?

I tend to think some GPU preemption mechanism should exist, but don't know if this is the way to do it!



3 Replies
Journeyman III


I really would appreciate some hints on multi-task preempting on GPU.

Many thanks



Unfortunately, there is no premption mechanism.

The closest that you can get is to make break down your task T1 into many smaller sub-tasks (say, by breaking up your domain of execution), and launch those sub-tasks at appropriate intervals. Now you can insert T2 in between as needed.

Now, the GPU relies on a large domain of execution (many many threads) to achieve its speedups, so at finer levels of granularity, there is a trade-off between timing precision and speedup.

Why do you need preemption, btw?


Thanks Udeepta,

Well the reason I'm interested in GPU premption is for CPU-GPU co processing optimization. Consider this loop :

(1) I have a main GPU task T1  that works on a data sample N.

(2) In the meantime, I have the CPU working on the previous result of GPU computation ie sample N-1.

Doing this way I can take full advantage of GPU and CPU working concurrently with only one synchro point - ie at the beginning of each loop.

Now the point is that while CPU is doing its own work, it may take advantage of short tasks T2 being run on the GPU - the processing involved for T2 being more suitabloe for GPU than for CPU.

The issue is that T1 is a batch and cannot really be split into pieces, should this be the case, I'd lose most of the benefits of concurrent GPU-CPU execution (since synchro schemes would have to be inserted).

So this is why having a form of preemption of GPU (on a different context for example) would be highly wishable.