I have 4 intel xeon quad core quad socket nodes connected via infiniband. This setup is currently used to rum MPI applications. Now I want to add an ATI card on each node and run a test OpenCL program. The program shall spawn across all nodes. For this puropose which MPI shall I use, or do I need MPI at all?
PS: It is necessary for me to run my OpenCL program over a cluster, and I am totally new to OpenCL.