Running MPI jobs effiiciently on 2 x AMD EPYC 7742
I want to run 4 MPI jobs on 2 x AMD EPYC 7742 processors in the most efficient way. Each job uses 64 threads. A single job runs about 4 times more efficiently than running 4 jobs simultaneously. I expected some overhead, but not 4 times. In order to minimize potential conflicts, would it be helpful to specify what threads are used by each job. For example, the results of lscpu indicate that each node organizes the threads in two groups. For NODE0, the threads are 0-63 and 128-191. For NODE1, the threads are 64-127 and 192-255. Would the jobs run more efficiently if I ran job #1 on the 0-63 threads, job #2 on the 128-191 threads, job #3 on the 64-127 threads, and job #4 on the 192-255 threads?