AnsweredAssumed Answered

Multiple GPUs and concurrent kernel execution

Question asked by uvedale on Jul 13, 2012
Latest reply on Jul 25, 2012 by Wenju



I'm trying to distribute work to two different GPUs efficiently by using dynamic scheduling.

My first attempt is working, but with one of the GPUs idling at one point for no apparent reason.


The basic scheduling algorithm is as follows:

Main func:

    Loop through GPUs (loop control i)

        lock work queue

        pop item off work queue

        unlock work queue

        set work item's target device to i

        call enqueuework function (work item)

    Wait for queue to become empty

    WaitForEvents(wait on all reads to complete)


EnqueueWork function (work item):

    create required buffers

    enqueueWrite on buffers using writeQueue

    enqueueMarker on writeQueue

    flush write queue

    create and enqueue kernels (depends on above writeQueue marker) on execQueue

    set callback on kernel completion to RunComplete function

    flush exec queue

    enqueueRead on readQueue (depends on kernel completion)

    set callback on read complete to ReadComplete function

    flush read queue


RunComplete function:

    gets an item from the work queue and calls the EnqueueWork function


ReadComplete function:

    Creates a thread to write the results to file


Note: All OpenCL calls are asynchronous and each device has its own set of queues.


The picture attached is the execution profile. As you can see from the image, the Cayman device isn't doing anything for ~3s, yet the buffers were written and the kernel was enqueued at the expected time. It only started when the Tahiti device finished its kernel, yet the inverse doesn't apply (Tahiti starts new kernels while Cayman is still running). Any ideas as to why this is?