Multi GPU  Cluster stream computing

Hi all
I have a question;
Is better have many gpu on a single matherboard to make parallel calculation or to have distributed node using a Gigabit Lan for each motherboard with a pciX GPU and using MPICH2 or LAM MPI message passing inteface programs ?
I have seen on AMD stream computing web page a PDF documentation
about a program called PGI compiler wich support LAM MPI and MPICH2 and cal/brook+
GPU programming libraries. i need more information about this and how to programming with brook+ language on sdk CAL SDK
where can i find a documentation on the data type of the brook+ langage
for example int,chat and supported function call ?