[mpich-discuss] MPI_Comm_idup hang with multiple nodes
Huiwei Lu
huiweilu at mcs.anl.gov
Tue Jun 2 17:04:53 CDT 2015
Hi Daniel,
Thanks for reporting the bug. I can reproduce it on my desktop. I will work
on a fix. You can track the progress here.
http://trac.mpich.org/projects/mpich/ticket/2269
--
Huiwei Lu
Postdoc Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
http://www.mcs.anl.gov/~huiweilu/
On Tue, Jun 2, 2015 at 4:32 PM, Daniel Pou <daniel.pou at gmail.com> wrote:
> Starting with the test comm_idup_mul.c, and increasing the value of
> NUM_ITER 10, I am seeing a
> fairly regular hang with a build from latest master (commit 25204 5/31).
>
> I am running with 2 ranks (1 per node) from SLURM (v2.6.5). I was able to
> see the behavior with
>
> srun -N 2 -n 2 ./comm_idup_mul
> (linking against SLURM PMI) and
>
> salloc -N 2 -n 2 mpiexec ./comm_idup_mul.
>
> I witnessed this with both the MXM and TCP netmods. I don't see any issues
> on single node runs.
>
> Thank you,
> -Dan
>
> For reference:
> Modified code from http://trac.mpich.org/projects/mpich/ticket/1935
>
>
> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
> /*
> * * (C) 2012 by Argonne National Laboratory.
> * * See COPYRIGHT in top-level directory.
> * */
>
> /*
> * * Test creating multiple communicators with MPI_Comm_idup.
> * */
>
> #include <stdio.h>
> #include <mpi.h>
>
> #define NUM_ITER 10
>
> int main(int argc, char **argv)
> {
> int i, rank;
> MPI_Comm comms[NUM_ITER];
> MPI_Request req[NUM_ITER];
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> for (i = 0; i < NUM_ITER; i++)
> MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]);
>
> MPI_Waitall(NUM_ITER, req, MPI_STATUSES_IGNORE);
>
> for (i = 0; i < NUM_ITER; i++)
> MPI_Comm_free(&comms[i]);
>
> if (rank == 0)
> printf(" No Errors\n");
>
> MPI_Finalize();
>
> return 0;
> }
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150602/970ff40c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list