[mpich-discuss] MPI_Comm_idup hang with multiple nodes

Daniel Pou daniel.pou at gmail.com
Tue Jun 2 16:32:38 CDT 2015


Starting with the test comm_idup_mul.c, and increasing the value of
NUM_ITER 10, I am seeing a
fairly regular hang with a build from latest master (commit 25204 5/31).

I am running with 2 ranks (1 per node) from SLURM (v2.6.5). I was able to
see the behavior with

srun -N 2 -n 2 ./comm_idup_mul
 (linking against SLURM PMI) and

salloc -N 2 -n 2 mpiexec ./comm_idup_mul.

I witnessed this with both the MXM and TCP netmods. I don't see any issues
on single node runs.

Thank you,
-Dan

For reference:
Modified code from http://trac.mpich.org/projects/mpich/ticket/1935


/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
 *  *  (C) 2012 by Argonne National Laboratory.
 *   *      See COPYRIGHT in top-level directory.
 *    */

/*
 *  * Test creating multiple communicators with MPI_Comm_idup.
 *   */

#include <stdio.h>
#include <mpi.h>

#define NUM_ITER    10

int main(int argc, char **argv)
{
    int i, rank;
    MPI_Comm comms[NUM_ITER];
    MPI_Request req[NUM_ITER];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    for (i = 0; i < NUM_ITER; i++)
        MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]);

    MPI_Waitall(NUM_ITER, req, MPI_STATUSES_IGNORE);

    for (i = 0; i < NUM_ITER; i++)
        MPI_Comm_free(&comms[i]);

    if (rank == 0)
        printf(" No Errors\n");

    MPI_Finalize();

    return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150602/0a92f887/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list