4 Replies Latest reply on Jan 12, 2009 2:16 PM by MicahVillmow

    Concurrent GPU accesses

    jean-claude

      Hi,

      Assume a program that has just filled and flushed a command queue to the GPU for a batch of consecutive kernels execution. To take full advantage of processing power, the program can then still perform a bunch of CPU tasks while waiting for GPU execution completion.

      Having said so, while all this stuff  is being executed, obviously the operating system still provides screen refresh, and other programs accesses to the GPU...

      Questions:

      (1) Is there a form of scheduler for GPU allocation?

      (2) What happen to the kernels queue issued by the program, is it execution being interleaved with other monitoring tasks imposed by the OS? So, what can be done to set up some form of concurrent sharing of the GPU?

      (3) On Vista32, what then does the parameter TdrDelay mean exactly in windows registry, ie is this maximum delay related to the maximum time elapsed between the last command queue flush and the first subsequent task completion from the GPU?

      (4) Overall, what kind of concurrent accesses monitoring is being performed under the hook by both the OS and the GPU driver, and how can this be used smartly to emulate a concurrent GPU access in a specific program?

      I understand that this is a little bit tricky... but this forum is for skilled and hungry programmers isn't it?

        • Concurrent GPU accesses
          MicahVillmow
          Jean-claude,

          1) There is no real scheduling available to the programmer other than submitting command buffers to the GPU for execution. After command buffer is submitted all tasks inside of that command buffer is executed in order. When and how a command buffer is submitted the GPU is controlled by the driver and not the user outside of the flush commands.
          2) A single command buffer can be considered executed atomically. However, again what constitutes a command buffer is controlled by the driver.
          3) Not 100% sure, but it might mean the amount of time before the OS interrupts the driver to get control from what it considers a 'frozen' program.
          4) Can you rephrase this to be more CAL specific and not OS specific?
            • Concurrent GPU accesses
              jean-claude

              Hi Micah,

              Thanks for your reply.

              Actually my question is still related to a previous one (here) I submitted last week.

              The point is that I'm trying to figure out how to open a "fast lane" for execution of short GPU tasks while the GPU has already been submitted a large batch of kernels.

              This situation typically occurs when the CPU (that has pursued its work concurrently while the GPU is working on the main batch) is now expecting a short assistance (ie short kernel execution) from GPU. If the new task has to wait until the batch is completed, then the CPU is stalled waiting for both to complete. This then completely clears the benefit of concurrent execution.

              What would sound more productive is to be able to force the insertion of a new task (in between the atoms execution of batch 1) ie doing some form of preempting - even if the exact timing is not precise.

              The GPU knows already the status of completion of the task queue related to task1, so it should be possible to feed task2 between the atoms...

              In my opinion this will become soon mandatory, since if we assume that in the not so far future one will have on his PC several applications taking advantages of GPU stream-co-processing, then obviously some form of scheduling (task slot assigment, GPU time sharing, ...) has to be jointly supported by the OS and the GPU driver.

              I leave it up to your thoughts.

              Jean-Claude

               

               

               

              • Concurrent GPU accesses
                jean-claude

                Hi Micah,

                Thanks for your reply.

                Actually my question is still related to a previous one (here) I submitted last week.

                The point is that I'm trying to figure out how to open a "fast lane" for execution of short GPU tasks while the GPU has already been submitted a large batch of kernels.

                This situation typically occurs when the CPU (that has pursued its work concurrently while the GPU is working on the main batch) is now expecting a short assistance (ie short kernel execution) from GPU. If the new task has to wait until the batch is completed, then the CPU is stalled waiting for both to complete. This then completely clears the benefit of concurrent execution.

                What would sound more productive is to be able to force the insertion of a new task (in between the atoms execution of batch 1) ie doing some form of preempting - even if the exact timing is not precise.

                The GPU knows already the status of completion of the task queue related to task1, so it should be possible to feed task2 between the atoms...

                In my opinion this will become soon mandatory, since if we assume that in the not so far future one will have on his PC several applications taking advantages of GPU stream-co-processing, then obviously some form of scheduling (task slot assigment, GPU time sharing, ...) has to be jointly supported by the OS and the GPU driver.

                I leave it up to your thoughts.

                Jean-Claude




              • Concurrent GPU accesses
                MicahVillmow
                Jean-claude,
                This is currently not possible as we do not have pre-emption on the GPU. Once a task is submitted, it is finished in order with no interruptions(except in the case of the OS/driver triggering a reset).

                Also, GPU's are not good at executing short/small tasks because of the cost of memory transfer and not filling up the hardware with enough data.