[mpich-discuss] MPI_Comm_idup hang with multiple nodes

Huiwei Lu huiweilu at mcs.anl.gov
Mon Jul 13 10:38:28 CDT 2015


Hi Daniel,

The ticket has been fixed in the latest MPICH master (or you can check out
the nightly tarballs). Please let me know if it fixes your issues.

Thanks,

--
Huiwei Lu
Postdoc Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
http://www.mcs.anl.gov/~huiweilu/

On Tue, Jun 2, 2015 at 5:04 PM, Huiwei Lu <huiweilu at mcs.anl.gov> wrote:

> Hi Daniel,
>
> Thanks for reporting the bug. I can reproduce it on my desktop. I will
> work on a fix. You can track the progress here.
>
> http://trac.mpich.org/projects/mpich/ticket/2269
>
> --
> Huiwei Lu
> Postdoc Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> http://www.mcs.anl.gov/~huiweilu/
>
> On Tue, Jun 2, 2015 at 4:32 PM, Daniel Pou <daniel.pou at gmail.com> wrote:
>
>> Starting with the test comm_idup_mul.c, and increasing the value of
>> NUM_ITER 10, I am seeing a
>> fairly regular hang with a build from latest master (commit 25204 5/31).
>>
>> I am running with 2 ranks (1 per node) from SLURM (v2.6.5). I was able to
>> see the behavior with
>>
>> srun -N 2 -n 2 ./comm_idup_mul
>>  (linking against SLURM PMI) and
>>
>> salloc -N 2 -n 2 mpiexec ./comm_idup_mul.
>>
>> I witnessed this with both the MXM and TCP netmods. I don't see any
>> issues on single node runs.
>>
>> Thank you,
>> -Dan
>>
>> For reference:
>> Modified code from http://trac.mpich.org/projects/mpich/ticket/1935
>>
>>
>> /* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
>> /*
>>  *  *  (C) 2012 by Argonne National Laboratory.
>>  *   *      See COPYRIGHT in top-level directory.
>>  *    */
>>
>> /*
>>  *  * Test creating multiple communicators with MPI_Comm_idup.
>>  *   */
>>
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> #define NUM_ITER    10
>>
>> int main(int argc, char **argv)
>> {
>>     int i, rank;
>>     MPI_Comm comms[NUM_ITER];
>>     MPI_Request req[NUM_ITER];
>>
>>     MPI_Init(&argc, &argv);
>>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>>     for (i = 0; i < NUM_ITER; i++)
>>         MPI_Comm_idup(MPI_COMM_WORLD, &comms[i], &req[i]);
>>
>>     MPI_Waitall(NUM_ITER, req, MPI_STATUSES_IGNORE);
>>
>>     for (i = 0; i < NUM_ITER; i++)
>>         MPI_Comm_free(&comms[i]);
>>
>>     if (rank == 0)
>>         printf(" No Errors\n");
>>
>>     MPI_Finalize();
>>
>>     return 0;
>> }
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150713/f47424ea/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list