[mpich-discuss] MPI_Reduce on an inter-communicator hangs
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Sun Apr 21 16:52:48 CDT 2024
I am calling MPI_Reduce on a set of inter-communicators created by MPI_Comm_spawn, each with one
process in the local group (the single manager) and two processes in the remote group (the workers). The inter-
communicators are visited one at a time in the manager.
All workers enter and exit MPI_Reduce without blocking, but the manager enters the first MPI_Reduce for the
first inter-communicator and never returns. What am I doing wrong? I am using MPICH 4.1.2.
Here is my manager code:
#define N_PROC 4
#define N_IN_GROUP 2
int main(int argc, char *argv[])
{
int rank, world_size, error_codes[N_PROC];
MPI_Comm intercoms[N_PROC];
char hostname[64];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
gethostname(hostname, sizeof(hostname));
char *p = strstr(hostname, ".");
*p = '\0';
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", hostname);
MPI_Info_set(info, "bind_to", "core");
for (int i = 0; i < N_PROC; ++i)
{
MPI_Comm_spawn("test_reduce_work", argv, N_IN_GROUP, info,
0, MPI_COMM_SELF, &intercoms[i], &error_codes[i]);
}
sleep(10);
unsigned array[100]{0};
for (int i = 0; i < N_PROC; ++i)
{
cout << "MANAGER: starting reduction " << i << "\n";
MPI_Reduce(NULL, array, 100, MPI_UNSIGNED, MPI_SUM, MPI_ROOT,
intercoms[i]);
cout << "MANAGER: finished reduction " << i << "\n"; // we never reach this point
}
for (int i = 0; i < 100; ++i) cout << array[i] << " ";
cout << endl;
MPI_Finalize();
}
And here is my worker code:
int main(int argc, char *argv[])
{
int rank, world_size;
MPI_Comm manager_intercom;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_get_parent(&manager_intercom);
unsigned array[100]{1};
cout << "WORKER: starting reduction\n";
MPI_Reduce(array, NULL, 100, MPI_UNSIGNED, MPI_SUM, MPI_PROC_NULL,
manager_intercom);
cout << "WORKER: finishing reduction\n";
sleep(10);
MPI_Finalize();
}
Finally, here it the invocation:
$ mpiexec -launcher ssh -print-all-exitcodes -wdir /home/kmccall/test_dir -np 1 -ppn 1 test_reduce_man
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240421/45a738df/attachment.html>
More information about the discuss
mailing list