[mpich-devel] segfault calling neighbor collectives in communicator with no topology

Dave Goodell goodell at mcs.anl.gov
Thu May 2 10:58:42 CDT 2013


Thanks for letting us know.  I've created a ticket to track this and commented on your suggestions there:

https://trac.mpich.org/projects/mpich/ticket/1833#comment:1

-Dave

On Apr 30, 2013, at 2:53 AM CDT, Lisandro Dalcin <dalcinl at gmail.com> wrote:

> I'm adding support for the MPI-3 neighborhood collectives to mpi4py.
> By mistake, I called a neighbor collective on COMM_SELF, and got a
> segfault. After running under valgrind, I get the trace below.
> 
> It seems that MPICH (running 3.0.4) is not checking the communicators
> for a topology being attached. This should be fixed in
> MPIR_Topo_canon_nhb_count() at src/mpi/topo/topoutil.c, adding a check
> after the following line:
> 
>    topo_ptr = MPIR_Topology_get(comm_ptr);
> 
> BTW, the same kind of check should also be added to MPIR_Topo_canon_nhb().
> 
> 
> ==14696== Invalid read of size 4
> ==14696==    at 0xDE0ED39: MPIR_Topo_canon_nhb_count (topoutil.c:283)
> ==14696==    by 0xDFA870A: MPIR_Ineighbor_allgather_default
> (inhb_allgather.c:50)
> ==14696==    by 0xDFA8B8B: MPIR_Ineighbor_allgather_impl (inhb_allgather.c:98)
> ==14696==    by 0xDFAE27C: MPIR_Neighbor_allgather_default (nhb_allgather.c:37)
> ==14696==    by 0xDFAE350: MPIR_Neighbor_allgather_impl (nhb_allgather.c:58)
> ==14696==    by 0xDFAE918: PMPI_Neighbor_allgather (nhb_allgather.c:155)
> ==14696==    by 0xDAD7B77:
> __pyx_pw_6mpi4py_3MPI_9Intracomm_25Neighbor_allgather
> (mpi4py.MPI.c:87767)
> ==14696==    by 0x31784DD280: PyEval_EvalFrameEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==14696==    by 0x31784DCEF0: PyEval_EvalFrameEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==14696==    by 0x31784DDCBE: PyEval_EvalCodeEx (in
> /usr/lib64/libpython2.7.so.1.0)
> ==14696==    by 0x317846DA36: ??? (in /usr/lib64/libpython2.7.so.1.0)
> ==14696==    by 0x3178449C0D: PyObject_Call (in /usr/lib64/libpython2.7.so.1.0)
> ==14696==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> 
> 
> --
> Lisandro Dalcin
> ---------------
> CIMEC (INTEC/CONICET-UNL)
> Predio CONICET-Santa Fe
> Colectora RN 168 Km 472, Paraje El Pozo
> 3000 Santa Fe, Argentina
> Tel: +54-342-4511594 (ext 1011)
> Tel/Fax: +54-342-4511169



More information about the devel mailing list