5 Replies Latest reply on May 3, 2012 1:44 PM by Meteorhead

    OpenGL / OpenCL concurrency, GPU resource sharing

    fred_em

      Hi,

       

      I have a question regarding OpenGL/OpenCL concurrency. If I:

       

      - create and bind an OpenGL context,

      - create and bind an OpenCL context - onwards, the OpenGL and OpenCL contexts will both remain bound at all times,

      - launch an OpenCL kernel

      - immediately after, draw a large OpenGL scene (my scenario is a scene that makes heavy use of shaders)

       

      will my GPU process the OpenCL kernel and OpenGL shaders concurrently, that is, sharing GPU resources? I have a situation in which I absolutely need to have both being processed at the same time. The question is, will I have to target the GPU or CPU for my kernel.

       

      Thanks,

      Fred

        • Re: OpenGL / OpenCL concurrency, GPU resource sharing
          Meteorhead

          The answer is: no, they will not.

           

          Let's assume your contexts are bound at all times (although OpenCL contexts are never really bound, but let's put that aside for a moment). If you launch a set of OpenCL kernels, (and flush your commandqueue, as you should do once your done with some well defined task), then your kernels will be put onto the drivers stack of kernels. Immediately after you issue your set of OpenGL kernels, and they will also be placed onto the drivers stack.

           

          I'm not 100% percent sure of the mechanics starting from here, but the result is guaranteed. Either the host side driver queue, or the thread dispatch processor on the device will prioritize and will not launch display kernels (OpenGL) before all compute kernels (OpenCL) are executed on the stack.

           

          In reality while this is somewhat troubling, you really never need your kernels to run concurrently. They most certainly do not communicate, and both your OpenGL call is only usable once it's done (the entire image has been rendered), and your OpenCL results are only useful also once they finish. If you want them to run really concurrently and race for resources, the only thing you'll accomplish is break your image even more visibly (the effect that is the result of not using V-Sync), cause there will be a bigger delay between parts of the image being rendered.

           

          So this is how OpenCL-OpenGL works concurrently. There are other methods of letting the two APIs race for resources, but single-thread, lock-step concurrency really does need them to run REALLY concurrently. And indeed, they will not (ever). The HD7xxx series is capable of running two different kernels at once, but not two different types (display/compute). The new NV Kepler monster (most likely in the Tesla product line) will be able to do kernel prioritizing, at least that what press releases say.

            • Re: OpenGL / OpenCL concurrency, GPU resource sharing
              fred_em

              Thanks for your reply. Not completely sure about what you mean though.

               

              First off, my OpenGL scene and OpenCL kernel need to communicate. The OpenCL kernel needs to feed OpenGL with some data. As soon as it can.

               

              Do you mean that calling wglMakeCurrent(hdc, hglrc) or glDrawElements() with shaders bound will force the driver to wait for the completion of the currently launched OpenCL kernel? Or that the current OpenCL kernel execution will pause while the OpenGL scene is being rendered? Or both?

               

              Meteorhead a écrit:

               

              If you want them to run really concurrently and race for resources, the only thing you'll accomplish is break your image even more visibly (the effect that is the result of not using V-Sync), cause there will be a bigger delay between parts of the image being rendered.

               

              Can you be more specific here?

               

              Regarding nVidia CUDA, the only thing I know is that on Fermi+ hardware, two concurrently running kernels will share GPU resources if originating from the same context. I do not know however if an OpenGL context can share GPU resources with a CUDA kernel, unfortunately.

               

              Fred

                • Re: OpenGL / OpenCL concurrency, GPU resource sharing
                  Meteorhead

                  1: "As soon as it can" doesn't mean they have to run at the same time.

                   

                  2: glDrawElements() will not wait. It will place the kernels on the driver stack immediately and return. Things happen asynchronous from then on. If you flood the GPU with OpenCL kernels, your windows (or linux) desktop will become choppy, because desktop render cannot be executed, because of the OpenCL kernels on the stack.

                   

                  3: http://en.wikipedia.org/wiki/Screen_tearing

                   

                  4: OpenCL and OpenGL contexts are not the same. They are two rather different entities. That is why they will not share (not even within CUDA) GPU resources. Not to mention that OpenGL is still display kernel, while CUDA is compute...

                    • Re: OpenGL / OpenCL concurrency, GPU resource sharing
                      fred_em

                      Meteorhead a écrit:

                       

                      > 1: "As soon as it can" doesn't mean they have to run at the same time.

                       

                      Long story short: I agree there almost always is a way to architect things such that things do not have to run in parallel, but in this discussion here let's leave this aside. Let's say you are right here, still.

                       

                      > 2: glDrawElements() will not wait. It will place the kernels on the driver stack immediately and return. Things happen asynchronous from then on. If you flood the GPU with OpenCL kernels, your windows (or linux) desktop will become choppy, because desktop render cannot be executed, because of the OpenCL kernels on the stack.

                       

                      OK, so glDrawElements is a driver kernel, so is an OpenCL kernel.

                      Basically you're saying the driver will always process one driver kernel at a time, sequentially. Making concurrency never happen.

                       

                      To simulate concurrency, or if you prefer, to make sure OpenGL rendering and OpenCL calculations progress at the same pace, I would have to break down my kernel into smaller subkernels and launch these subkernels iteratively, progressively, and call glDrawElements inbetween, periodically. In other words, instead of doing:

                       

                      launchOpenCLKernel(1000x1000_matrix)

                      glDrawElements(1000_primitives)

                       

                      I would have to do:

                       

                      for (int n=0; n < 100; n++)

                      {

                      launchOpenCLKernel(10x1000_matrix)

                      glDrawElements(10_primitives)

                      }

                       

                      3: http://en.wikipedia.org/wiki/Screen_tearing

                       

                      > 4: OpenCL and OpenGL contexts are not the same. They are two rather different entities. That is why they will not share (not even within CUDA) GPU resources. Not to mention that OpenGL is still display kernel, while CUDA is compute...

                       

                      They WILL share the resources in the case of 2 CUDA kernels created off the same context.

                       

                      nVidia graphics cards have one or more SMs, Shader Multiprocessors. The number of SM more or less varies between 1 and 16. Apparently when you run 2 CUDA kernels on Fermi+ hardware, they will each run on half the SMs, in parallel, pretty much like 2 cores are being used by a CPU.

                       

                      Again, I do not know about an OpenGL kernel/CUDA kernel mix. I see no information on the web, anywhere.

                       

                      Fred

                        • Re: OpenGL / OpenCL concurrency, GPU resource sharing
                          Meteorhead

                          4: Yes, in CUDA they will. Because Fermi and later HW has a thread dispatch processor that can issue different kernels to different SMs. You can do: clEnqueueNDRangeKernel(kernel1,...); clEnqueueNDRangeKernel(kernel2,...); and as long as these kernels do not flood the GPU, they will be issued to different SMs simultaniously. On AMD cards this will not happen. The thread dispatch processor can only run one type of kernel at any given time on all of the SMs. HD69xx can run 2 different types. But the Fermi, arbitrary number of types.

                           

                          2: If it makes sense to draw 10 primitives from a scene, and to do part of a job, then yes. But I fear that partial render hardly ever makes sense.

                           

                          About concurrency, different vendors implement their drivers in different manner. The thing to keep in mind is that there is a queue inside the driver, and one inside the GPU. When you call clEnqueueNDRangeKernel(), the kernel calls are placed onto the driver queue. When you call clFlush(), they are explicitly sent out to the device. Concurrency on AMD HW mostly never happens. Best thing you can get is always keep the queues busy, and feed them with kernels in the best manner you need.

                           

                          If you want to do all the OpenCL work and then render one image, you would do:

                           

                          for(int i = 0 ; ...){

                          1) clEnqueueNDRangeKernel(matrix_transpose, NxN, prefWGS, ...);

                          2) glDrawElements(entire_scene);

                          3) glFlush();

                          4) clFlush();

                          5) whatever_work_remains_on_CPU();

                          6) clFinish();

                          7) glFinish();

                          }

                           

                          Notice they are sent out to the device concurrently, but OpenCL kernels will definately finish before OpenGL render. Does it make sense to render part of a scene? What sort of application is this?