cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jch
Journeyman III

Using MPI(mpich2) with -machinefile, CAL can't find GPU Devices.

windows 7(x64), SDK 2.4, catalyst 11.5, mpich2-1.3.2p1

Hi!
I am making a CAL-MPI program that process parallel in GPU Cluster. My problem is that 'mpiexec with -localonly option' find devices, but 'mpiexec with -machinefile option' can't find devices.

My code and running result are provided below. How can MPI find GPU devices with -machinefile option?

[Result 1]
C:\mpi_test>mpiexec -localonly -n 2 mpi_test.exe
numDevices = 00000004
numDevices = 00000004

[Result 2]
C:\mpi_test>mpiexec -n 2 mpi_test.exe
numDevices = 00000000
numDevices = 00000000

[Result 3]
C:\mpi_test>mpiexec -machinefile host.txt -n 2 mpi_test.exe
numDevices = 00000000
numDevices = 00000000

[host.txt] file
----------------
localhost:2
----------------

#include <stdio.h> #include <mpi.h> #include "cal.h" #include "calcl.h" #include <windows.h> #define MASTER_NODE 0 CALAPI CALresult (CALAPIENTRY *calInit)(void); CALAPI CALresult (CALAPIENTRY *calShutdown)(void); CALAPI CALresult (CALAPIENTRY *calDeviceGetCount)(CALuint* count); int LoadLibraryCal() { HINSTANCE hDLL; if((hDLL = LoadLibraryA("aticalrt64.dll")) == 0) return FALSE; if((calInit = (CALresult (__cdecl *)(void)) GetProcAddress(hDLL, "calInit")) == 0) return FALSE; if((calShutdown = (CALresult (__cdecl *)(void))GetProcAddress(hDLL, "calShutdown")) == 0) return FALSE; if((calDeviceGetCount = (CALresult (__cdecl *)(CALuint *))GetProcAddress(hDLL, "calDeviceGetCount")) == 0) return FALSE; return TRUE; } int main(int argc, char *argv[]) { int size_mpi, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size_mpi); MPI_Comm_rank(MPI_COMM_WORLD, &rank); CALuint numDevices = 0; if(LoadLibraryCal()==FALSE) return -1; if(calInit()!=CAL_RESULT_OK) return -1; if ( rank == MASTER_NODE ) { if(calDeviceGetCount(&numDevices)!=CAL_RESULT_OK) return -1; printf("numDevices = %08x\n",numDevices); } else { if(calDeviceGetCount(&numDevices)!=CAL_RESULT_OK) return -1; printf("numDevices = %08x\n",numDevices); } MPI_Finalize(); calShutdown(); return 0; }

0 Likes
6 Replies
rick_weber
Adept II

You need to make sure DISPLAY=:0 for all compute nodes. I think all you need to do is put export DISPLAY=:0 in your .bashrc file. Be warned X11 forwarding won't work until you remove this variable.

0 Likes

My program works in windows 7(x64), SDK 2.4, catalyst 11.5, mpich2-1.3.2p1.
'mpiexec -localonly' command to find GPU devices is worked as [result 1]. But 'mpiexec -machinefile' is not worked as [result 2,3].

0 Likes

We had a similar problem once in Windows trying to run CUDA applications on a single remote machine. Our solution was to install Microsoft HPC pack and create a cluster on a single machine. We left a single user logged into the console at all times and issued jobs with a flag that attached them to the console user. This allowed the jobs to access the video drivers and CUDA to run. You might be able to do something similar, but for the whole cluster. It's by no means an elegant solution, but it might work.

0 Likes

Thanks rick.weber.

But my program is considered for many clusters. Why remote executed program can't access the video drivers? How remote program can access the drivers?

0 Likes
jch
Journeyman III

mpich2 developer(Jayesh) said that
Try using the "-localonly" option (or "-localroot" option) to use the GPUs on the local node. Supporting usage of GPUs on non-local nodes is in our to-do list
0 Likes