I am writing a problem using multiple GPUs in OpenCL. So I used MPI.
When I test my code, it works perfectly on a cluster with Tesla C2050s which has 2GB global memory each.
However, when I move my code to a cluster with Tesla M2070s which has 5GB global memory each, I got error code -4 which is "CL_MEM_OBJECT_ALLOCATION_FAILURE" when I try to write the GPU buffers.
I used the same input for both runs, and allocated 12 GPUs in both clusters, do you guys know what might be the problem?