24 Replies Latest reply on Jun 8, 2010 9:29 AM by genaganna

    Using Multiple GPUs in a OpenCL program

    richi

      I have a PC with 2 R4870 video cards running in Linux. Using pyOpenCL, I can run a program in either GPU, but when I try to run 2 simultaneous kernels (one in each card), it seems that in order to the second queued kernel to run, the first one queued must be finished. I'm expecting that I can queue 2 instances of the same kernel, one in each GPU, and that the total running time should be roughly  the same as if I run only one instance, but this is not happening. I have tried using a clFlush after queuing each kernel, but the running time still the same.

      Is it possible to use both (multiple) GPUS simultaneously in OpenCL? How can this be done?

        • Using Multiple GPUs in a OpenCL program
          n0thing

          Right now multiple-GPU or CPU+GPU doesn't work on AMD's implementation.

          It just returns a CL_COMPLETE execution status after clFlush command.

            • Using Multiple GPUs in a OpenCL program
              nou

              i do not think so. smallluxGPU work well on multi-GPU and CPU+GPU. but it use one context per device.

                • Using Multiple GPUs in a OpenCL program
                  davibu

                   

                  Originally posted by: nou i do not think so. smallluxGPU work well on multi-GPU and CPU+GPU. but it use one context per device.

                  Yup, and I use 1 thread per GPU too. So 1 thread, 1 context, 1 queue for each GPU. I tried other configurations but they weren't working (i.e. not running in parallel).

                  BTW, it looks like it works well only under Linux because I'm experiencing horrible performance under Window 7 64bit with multiple GPUs (but this could be related to some thread/mutex issue and not to the OpenCL driver, I'm still investigating the problem).

                   

                   

                    • Using Multiple GPUs in a OpenCL program
                      empty_knapsack

                      I can only say that at CAL level (and obviously OpenCL built upon CAL) there are numerous problems with multiple GPUs.

                      Definitely you're need one thread and one context per each GPU to make it working. But it itsn't enough because almost every CAL function isn't thread safe, thus calling calResMap() (which is the only to get access to local GPU memory) in one thread blocks all other threads/contexts.

                      And (as I've already wrote at these forums), OpenCL using calCtxWaitForEvent() function instead of CPU burning loop

                      while (calCtxIsEventDone(calCtx, e) == CAL_RESULT_PENDING);

                      to wait for GPU kernel completion.

                      But this calCtxWaitForEvent() also blocks every context currently running. This especially noticeable when there are different devices at system (like 5770+4770). So basically it's simply impossible to asynchronously work with multiple GPUs within single process.

                       

                      All above things applies to windows version of CAL, never tried linux one.

                        • Using Multiple GPUs in a OpenCL program
                          alexg

                          For my education, is it currently impossible to use multiple GPUs or multiple graphics cards?

                          In other words, what about a single Radeon 5970 (a dual-GPU card)?

                            • Using Multiple GPUs in a OpenCL program
                              nou

                              well using multiple GPU is possible. but there is issue with crossfire. if you have crossfire enabled then second GPU return incorrect results. but with 5970 you can not disable crossfire so you can use only first GPU. this shoul fix next driver or SDK.

                                • Using Multiple GPUs in a OpenCL program
                                  alexg

                                   

                                  Originally posted by: nou well using multiple GPU is possible. but there is issue with crossfire. if you have crossfire enabled then second GPU return incorrect results. but with 5970 you can not disable crossfire so you can use only first GPU. this shoul fix next driver or SDK.

                                   

                                  If I understood the previous discussion correctly, one would need to run multiple CPU threads to use multiple GPUs, but the SDK is not thread-safe, so this is not really an option.

                                    • Using Multiple GPUs in a OpenCL program
                                      achinda99

                                      While I'm not familiar with ATI systems too much, with NVIDIA hardware, SLI must be disabled to utilize mutliple GPUs.  I would assume that for ATI, CrossFire would need to be disabled too.

                                      As far as doing multiGPU, it is quite capable and there is a wonderful example in the NVIDIA SDK.  I have successfully implemented my own version and tested it across various devices.  The basic concept is to:

                                      1. Find all compatible devices

                                      2. For each device, create a command queue on the same context

                                      3. Allocate work to each queue individually

                                      I have this successfully running with NVIDIA cards, however what brought me back to the ATI forum is that my program crashes when using the ATI SDK and driver.  ATI may not currently support multiGPU.

                                        • Using Multiple GPUs in a OpenCL program
                                          dravisher

                                          achinda99: That does work in principle on the AMD implementation, but it seems that AMD's OpenCL is "lazy", so unless you call a blocking command on the queue (flush/finish) it won't do much/anything. The only way to be sure seems to be to launch two host threads and call flush or finish from each of them to the respective command queue.

                                          This seems like it's unnecessarily complicated (and apparently unsafe according to other posts), so I hope AMD will make their implementation less lazy in the future, so that the command queue get's to work without having to call a blocking command on it.

                                          Please correct me if I'm wrong anyone, because I hope I am

                                          • Using Multiple GPUs in a OpenCL program
                                            genaganna

                                             

                                            Originally posted by: achinda99 While I'm not familiar with ATI systems too much, with NVIDIA hardware, SLI must be disabled to utilize mutliple GPUs.  I would assume that for ATI, CrossFire would need to be disabled too.

                                             

                                            As far as doing multiGPU, it is quite capable and there is a wonderful example in the NVIDIA SDK.  I have successfully implemented my own version and tested it across various devices.  The basic concept is to:

                                             

                                            1. Find all compatible devices

                                             

                                            2. For each device, create a command queue on the same context

                                             

                                            3. Allocate work to each queue individually

                                             

                                            I have this successfully running with NVIDIA cards, however what brought me back to the ATI forum is that my program crashes when using the ATI SDK and driver.  ATI may not currently support multiGPU.

                                             

                                            Achinda99,

                                                            Please provide a test case or your code to reproduce this issue. Please also provide your system details like OS, GPU, Driver version, SDK Version.

                                              • Using Multiple GPUs in a OpenCL program
                                                achinda99

                                                I narrowed down the problem I have to writing a 2D image from the host to the device for which I created another thread.  On this thread, I was merely commenting that what brought me back to the forum was trouble running my code on ATI devices.  In response to the ongoing thread, I was just commenting that maybe the current OpenCL implementation in the Stream SDK doesn't support multiGPU, which appears to be incorrect as someone pulled it off.

                                                  • Using Multiple GPUs in a OpenCL program
                                                    dmarchet

                                                    Any isse or SDK fix to realy support multiple GPU?

                                                    (without creating separate contexts, queues, threads... and run them in parallel???)

                                                    Any examples?

                                                    Will be running on si-28 embedded platform with amd cpu + 2 onboard ati E490 + HD3200  chipset 780E so no way i can dissable/enable crossfire...

                                • Using Multiple GPUs in a OpenCL program
                                  ebfe

                                  Pyrit also uses seperate contexts and queues for all GPUs. Its a bug in AMD's implementation.

                                  Also see http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=128846&enterthread=y