How Brook+ runtime handles many processes using GPU?

Discussion created by Raistmer on Sep 8, 2009
Latest reply on Sep 10, 2009 by Raistmer
some process-preemption related questions

Lets consider 2 processes that make use of GPU, A and B
Process A starts, allocates some buffers (global streams) on GPU, executes few kernels and then preempted by process B for some time.

1) Will proces B be able to allocate almost all GPU memory or only that part leaved free after process A allocations?

After control will be returned to process A again:
2) Will process A GPU buffers refilled again from host memory or its datas rest in GPU memory while process A was preempted?
3) need kernels of process A be recompiled again?
4) need kernels of process A be loaded on GPU again?

The reasons I ask these questions:
1) calDeviceGetStatus() returns same memory amount (480MB of free GPU RAM for my HD4870 512MB) no matter how many other GPU-usings apps running.
2) While running concurently with other GPU-using app I see few orders of magnitude (up to seconds!) increase of maximum (and mean) run time for some blocks (block includes stream read and kernel call). Block embraced in mutex that used by all GPU-related running apps so no GPU context switches inside such block.

It looks like after reciving control on GPU again app should wait while Brook runtime will restore GPU state somehow and this restoration includes memory copy or kernel re-compilation - don't know why app should wait seconds to continue?....

EDIT: and what recommended practice for GPU sharing?
Should I use fine-grained GPU lock (like now) where only small part of whole task completed w/o releasing GPU to other apps or coarse-graining should be used where GPU should be hold by same process until allocated buffers will be not needed so app can reallocate all of them on next GPU acquisition?