8 Replies Latest reply on Feb 24, 2011 3:50 PM by rick.weber

    Running multiple independent kernels on a single GPU

    richeek.arya

      Hi all,

      This question is dealt with multiple forums. However, I did not get clear picture as people have conflicting opinions.So I am asking it again.

      Suppose my current kernel is not using GPU resources completely, say only 10 threads are running at a time.  Hence to harness the full computational power can I run another independent kernel along with the current kernel on GPUs?

      My guess is that it is not possible. Somewhere in AMD's GPU literature I read that if I try to run two kernels simultaneously they will automatically be serialized. However in OpenCL there is an option to create multiple independent command queues which dont require any synchronization. So if I put first kernel in the first queue and the second in the another would it make both of them running simultaneously?

      I have Radeon 5450 graphics card just if anyone is curious.

      Thanks!

        • Running multiple independent kernels on a single GPU
          nou

          from theroretical point of view. when you enable out of order queue then enqueueing two kernels without dependeci should perform paralely. or when you create two queues on the same devices then again it can execute parralely.

          but AFAIK there is currenty no support for concurent kernels on AMD GPU's.

          • Running multiple independent kernels on a single GPUHe
            HarryH

            Here is the pseudocode how I did it using OpenMP. In order to run multiple instances of

            the same kernel you need to create multiple kernel objects in different threads because

            clSetKernelArgs is not threadsafe. This worked on my system (see below)

            #include <omp.h> find / select platform create context on GPU device find /select devices associated with context read kernel source / create program build program create any memobjs to be shared RO among kernels in different threads #pragma omp parallel { create the command queue create the kernel objects create private memobjs for this thread set kernel arguments execute kernels release kernels release private memobjs release commandqueue free any per thread malloc'ed buffers etc. } release shared memobjs release program release context free globally allocated buffers exit

              • Running multiple independent kernels on a single GPU
                perhaad

                If you create kernels and issue them on different queues the 2nd kernel will not execute until the 1st one is done (on GPUs). You could use multiple command queues to overlap IO but enqueing kernels wouldnt help

                Adding the previous thread just in case. Its pretty recent ~ 3 mths

                http://devforums.amd.com/devforum/messageview.cfm?catid=390&threadid=142485&messid=1187010&parentid=0&FTVAR_FORUMVIEWTMP=Branch

                 

                  • Running multiple independent kernels on a single GPU
                    rick.weber

                    Does the driver maintain an ordering on PCI transfers with respect to kernels that read and write to them, or is it a crapshoot? For example, suppose I have kernel A running that uses buffer B. Then I enqueue a task in the second queue to write to buffer B. Will the DMA happen alongside execution (bad), will it block until A completes, or will the copy block until ALL kernels issued before it that reference it regardless of queue complete. The first and second way can give you race conditions, while the third way can give you deadlocks.

                      • Running multiple independent kernels on a single GPU
                        LeeHowes

                        This doesn't happen right now with OpenCL. There are technical reasons that I can't go in to. It does work fine in DX. AMD GPUs currently run multiple kernels in an uncontrollable way: they do not allow you to split the device. The device does run complicated task graphs happily in the background because that's how graphics code works. Unfortunately right now it only works within the restrictions of the DX API rather than the more general OpenCL API.

                        • Running multiple independent kernels on a single GPU
                          nou

                          rick.weber: if you don't have out of order queue then you don't get any overlap execution/transfers. each enqueued item is completed before next one. in order queues have implicit synchronization.

                          in out of order and/or multi queues you must ensure proper synchronization with event objects.

                          LeeHowes: do you mean that with DX and maybe OpenGL it can run concurent shaders? like veretex and pixel shader cocurently? or it is also direct compute?

                            • Running multiple independent kernels on a single GPU
                              rick.weber

                               

                              Originally posted by: nou rick.weber: if you don't have out of order queue then you don't get any overlap execution/transfers. each enqueued item is completed before next one. in order queues have implicit synchronization.

                               

                              in out of order and/or multi queues you must ensure proper synchronization with event objects.

                               

                              So, out of order execution and multiple queues are equivalent then, other than you can impose some ordering without events in the multi-queue in-order case. I'm contemplating adding another queue to clUtil when you enable OOO execution mode. One queue will be dedicated to data transfers and one will be dedicated to kernels in this mode, allowing you to overlap communication and execution. Then you use callbacks from the asynchronous versions of clUtil calls to manage dependencies (e.g. don't enqueue it if it can't run immediately). If I recall, AMD OpenCL doesn't currently support out-of-order queues.

                      • Running multiple independent kernels on a single GPU
                        MicahVillmow
                        nou,
                        AMD hardware has been able to run concurrent shaders since R6XX in graphics mode.