[mpich-discuss] MPI_Comm_idup hang with multiple nodes

Huiwei Lu huiweilu at mcs.anl.gov
Tue Jun 2 17:04:53 CDT 2015


Hi Daniel,

Thanks for reporting the bug. I can reproduce it on my desktop. I will work
on a fix. You can track the progress here.

http://trac.mpich.org/projects/mpich/ticket/2269

--
Huiwei Lu
Postdoc Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
http://www.mcs.anl.gov/~huiweilu/

On Tue, Jun 2, 2015 at 4:32 PM, Daniel Pou <daniel.pou at gmail.com> wrote:

> Starting with the test comm_idup_mul.c, and increasing the value of
> NUM_ITER 10, I am seeing a
> fairly regular hang with a build from latest master (commit 25204 5/31).
>
> I am running with 2 ranks (1 per node) from SLURM (v2.6.5). I was able to
> see the behavior with
>
> srun -N 2 -n 2 ./comm_idup_mul
>  (linking against SLURM PMI) and
>
> salloc -N 2 -n 2 mpiexec ./comm_idup_mul.
>
> I witnessed this with both the MXM and TCP netmods. I don't see any issues
> on single node runs.
>
> Thank you,
> -Dan
>
> For reference:
> Modified code from http://trac.mpich.org/projects/mpich/ticket/1935
>
>
> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
> /*
>  *  *  (C) 2012 by Argonne National Laboratory.
>  *   *      See COPYRIGHT in top-level directory.
>  *    */
>
> /*
>  *  * Test creating multiple communicators with MPI_Comm_idup.
>  *   */
>
> #include <stdio.h>
> #include <mpi.h>
>
> #define NUM_ITER    10
>
> int main(int argc, char **argv)
> {
>     int i, rank;
>     MPI_Comm comms[NUM_ITER];
>     MPI_Request req[NUM_ITER];
>
>     MPI_Init(&argc, &argv);
>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
>     for (i = 0; i < NUM_ITER; i++)
>         MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]);
>
>     MPI_Waitall(NUM_ITER, req, MPI_STATUSES_IGNORE);
>
>     for (i = 0; i < NUM_ITER; i++)
>         MPI_Comm_free(&comms[i]);
>
>     if (rank == 0)
>         printf(" No Errors\n");
>
>     MPI_Finalize();
>
>     return 0;
> }
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150602/970ff40c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list