Archives Discussions

jean-claude · ‎01-11-2009

Hi,

Assume a program that has just filled and flushed a command queue to the GPU for a batch of consecutive kernels execution. To take full advantage of processing power, the program can then still perform a bunch of CPU tasks while waiting for GPU execution completion.

Having said so, while all this stuff is being executed, obviously the operating system still provides screen refresh, and other programs accesses to the GPU...

Questions:

(1) Is there a form of scheduler for GPU allocation?

(2) What happen to the kernels queue issued by the program, is it execution being interleaved with other monitoring tasks imposed by the OS? So, what can be done to set up some form of concurrent sharing of the GPU?

(3) On Vista32, what then does the parameter TdrDelay mean exactly in windows registry, ie is this maximum delay related to the maximum time elapsed between the last command queue flush and the first subsequent task completion from the GPU?

(4) Overall, what kind of concurrent accesses monitoring is being performed under the hook by both the OS and the GPU driver, and how can this be used smartly to emulate a concurrent GPU access in a specific program?

I understand that this is a little bit tricky... but this forum is for skilled and hungry programmers isn't it?

MicahVillmow · ‎01-12-2009

Jean-claude,

1) There is no real scheduling available to the programmer other than submitting command buffers to the GPU for execution. After command buffer is submitted all tasks inside of that command buffer is executed in order. When and how a command buffer is submitted the GPU is controlled by the driver and not the user outside of the flush commands.
2) A single command buffer can be considered executed atomically. However, again what constitutes a command buffer is controlled by the driver.
3) Not 100% sure, but it might mean the amount of time before the OS interrupts the driver to get control from what it considers a 'frozen' program.
4) Can you rephrase this to be more CAL specific and not OS specific?

jean-claude · ‎01-12-2009

Hi Micah,

Thanks for your reply.

Actually my question is still related to a previous one (here) I submitted last week.

The point is that I'm trying to figure out how to open a "fast lane" for execution of short GPU tasks while the GPU has already been submitted a large batch of kernels.

This situation typically occurs when the CPU (that has pursued its work concurrently while the GPU is working on the main batch) is now expecting a short assistance (ie short kernel execution) from GPU. If the new task has to wait until the batch is completed, then the CPU is stalled waiting for both to complete. This then completely clears the benefit of concurrent execution.

What would sound more productive is to be able to force the insertion of a new task (in between the atoms execution of batch 1) ie doing some form of preempting - even if the exact timing is not precise.

The GPU knows already the status of completion of the task queue related to task1, so it should be possible to feed task2 between the atoms...

In my opinion this will become soon mandatory, since if we assume that in the not so far future one will have on his PC several applications taking advantages of GPU stream-co-processing, then obviously some form of scheduling (task slot assigment, GPU time sharing, ...) has to be jointly supported by the OS and the GPU driver.

I leave it up to your thoughts.

Jean-Claude

jean-claude · ‎01-12-2009