Brook+ runtime doesnot automattically do load balancing. This control is given to user in Brook+ 1.4. You can take a look at stream computing user guide for the details. There are sample that ship with Brook+ SDK those use multi-GPU implementation under CPP\apps\MonteCarlo_MultiGPU and CPP\tutorials\MultiGPU.
The default GPU is the first one.
Originally posted by: gaurav.garg
Brook+ runtime doesnot automattically do load balancing. This control is given to user in Brook+ 1.4. You can take a look at stream computing user guide for the details. There are sample that ship with Brook+ SDK those use multi-GPU implementation under CPP\apps\MonteCarlo_MultiGPU and CPP\tutorials\MultiGPU.
The default GPU is the first one.
You can follow the CPP examples to target a specific GPU in a mult-gpu system.
But we have a cluster of CPUs each of which has multiple GPUs, and we've found that the simplest way is to address each GPU is to use "setenv DISPLAY :0.x" in csh, where x = 0, 1, 2, ... is the id of the GPU. We then use MPI to fire up a job on each GPU in the cluster. Message passing between the jobs are handled by MPI calls. We have found this method to be most scalable for a distributed multi-gpu cluster. You write one code using MPI, and it would work on a single multi-gpu machine or a cluster containing any number of multi-gpu machines. No complicated multi-gpu brook+ APIs.
When it comes to load balancing, you are on your own. But if you have symmetric GPUs, load balancing is not needed.
Originally posted by: hagen You can follow the CPP examples to target a specific GPU in a mult-gpu system.
But we have a cluster of CPUs each of which has multiple GPUs, and we've found that the simplest way is to address each GPU is to use "setenv DISPLAY :0.x" in csh, where x = 0, 1, 2, ... is the id of the GPU. We then use MPI to fire up a job on each GPU in the cluster. Message passing between the jobs are handled by MPI calls. We have found this method to be most scalable for a distributed multi-gpu cluster. You write one code using MPI, and it would work on a single multi-gpu machine or a cluster containing any number of multi-gpu machines. No complicated multi-gpu brook+ APIs.
Thank you! This is very useful information.
When it comes to load balancing, you are on your own. But if you have symmetric GPUs, load balancing is not needed.
Does this mean that by default there's no assumption that the multiple GPUs must be identical (or even in the same family)? For example, can I add a Radeon 5700 to my desktop specifically for stream programming, while the display is connected to the old Radeon 4000?
Also, why is load balancing not needed when GPUs are symmetric? Do you assume that multiple tasks have the same length?
Thanks in advance!
Originally posted by: hagen You can follow the CPP examples to target a specific GPU in a mult-gpu system.
But we have a cluster of CPUs each of which has multiple GPUs, and we've found that the simplest way is to address each GPU is to use "setenv DISPLAY :0.x" in csh, where x = 0, 1, 2, ... is the id of the GPU. We then use MPI to fire up a job on each GPU in the cluster. Message passing between the jobs are handled by MPI calls. We have found this method to be most scalable for a distributed multi-gpu cluster. You write one code using MPI, and it would work on a single multi-gpu machine or a cluster containing any number of multi-gpu machines. No complicated multi-gpu brook+ APIs.
When it comes to load balancing, you are on your own. But if you have symmetric GPUs, load balancing is not needed.
Can you explain a bit more? I'm new to MPI, but don't have much knowledge about Linux
Do you:
Run a job that setenv=0.0 then
Open another csh run a job that setenv=0.1 then
Open another csh run a job that setenv=0.2 then
Open another csh run a job that setenv=0.3
So any first core in a machine run a job that runs on first GPU, second core run another job on second GPU and so on?
I confuse about your technique, can you post it in a blog with some flow chart...
Help...