1 Reply Latest reply on Feb 21, 2008 10:48 PM by michael.chu

    Multi GPU  Cluster stream computing

      Multi GPU Cluster stream computing

      Hi all
      I have a question;
      Is better have many gpu on a single matherboard to make parallel calculation or to have distributed node using a Gigabit Lan for each motherboard with a pciX GPU and using MPICH2 or LAM MPI message passing inteface programs ?
      I have seen on AMD stream computing web page a PDF documentation
      about a program called PGI compiler wich support LAM MPI and MPICH2 and cal/brook+
      GPU programming libraries. i need more information about this and how to programming with brook+ language on sdk CAL SDK
      where can i find a documentation on the data type of the brook+ langage
      for example int,chat and supported function call ?
        • Multi GPU  Cluster stream computing
          Unfortunately, it isn't that easy to say which one is going to be better. It is somewhat application and dataflow dependent. Do you have a particular application/dataflow you are thinking about?

          For example, while using a GigE-linked cluster will mean communication between GPUs on separate nodes will be slower than on a multi-GPU system, it does mean that you have now have a CPU dedicated to processing data and feeding the GPU whereas in a multi-GPU setup, you have a single GPU trying to coordinate and feed data to 2 GPUs (once again, how much effect this has depends on the application). Also, you have to consider that with a cluster, you will end up having multiple disk controllers and disks servicing your application as well. Of course, this only has an effect if you end up needing to come from or go to disk often in your application.

          If what you need to do on the CPU isn't very much (i.e. you aren't reordering data or something else with the CPU) then having a multi-GPU setup will probably allow you to finish your computation faster.

          If you CPU is effectively maxed out trying to feed your GPUs and it is maxed out trying to preprocess data, for example, then having a cluster might be better.

          Once again, depends on the application. If you can let me know what your general dataflow and computation is, it will lead to a more concrete recommendation.

          Have you downloaded the SDK from the website? If so, there are doc directories underneath Brook+ and CAL. We are working on improving the documentation but take a look and let me know if you have any questions.