[mpich-discuss] MPI_Comm_idup hang with multiple nodes
Daniel Pou
daniel.pou at gmail.com
Tue Jun 2 16:32:38 CDT 2015
Starting with the test comm_idup_mul.c, and increasing the value of
NUM_ITER 10, I am seeing a
fairly regular hang with a build from latest master (commit 25204 5/31).
I am running with 2 ranks (1 per node) from SLURM (v2.6.5). I was able to
see the behavior with
srun -N 2 -n 2 ./comm_idup_mul
(linking against SLURM PMI) and
salloc -N 2 -n 2 mpiexec ./comm_idup_mul.
I witnessed this with both the MXM and TCP netmods. I don't see any issues
on single node runs.
Thank you,
-Dan
For reference:
Modified code from http://trac.mpich.org/projects/mpich/ticket/1935
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
/*
* * (C) 2012 by Argonne National Laboratory.
* * See COPYRIGHT in top-level directory.
* */
/*
* * Test creating multiple communicators with MPI_Comm_idup.
* */
#include <stdio.h>
#include <mpi.h>
#define NUM_ITER 10
int main(int argc, char **argv)
{
int i, rank;
MPI_Comm comms[NUM_ITER];
MPI_Request req[NUM_ITER];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
for (i = 0; i < NUM_ITER; i++)
MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]);
MPI_Waitall(NUM_ITER, req, MPI_STATUSES_IGNORE);
for (i = 0; i < NUM_ITER; i++)
MPI_Comm_free(&comms[i]);
if (rank == 0)
printf(" No Errors\n");
MPI_Finalize();
return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150602/0a92f887/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list