[mpich-discuss] Possible integer-overflow for MPI_COMM_WORLD in MPI_Iprobe
Jeff Hammond
jeff.science at gmail.com
Mon Jan 21 21:46:28 CST 2019
I was able to reproduce with my own test (
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/bugs/iprobe-overflow.c)
with Intel MPI 2019, so I will report that bug to the Intel MPI team. It
should be easy enough for them to figure out if this bug is from MPICH or
not.
2139000000 iterations, 627.143272 seconds
2140000000 iterations, 627.436206 seconds
2141000000 iterations, 627.729135 seconds
2142000000 iterations, 628.022049 seconds
2143000000 iterations, 628.315015 seconds
2144000000 iterations, 628.608066 seconds
2145000000 iterations, 628.901065 seconds
2146000000 iterations, 629.193992 seconds
2147000000 iterations, 629.488107 seconds
Abort(738833413) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Iprobe:
Invalid communicator, error stack:
PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG,
MPI_COMM_WORLD, flag=0x7ffdf75a396c, status=0x7ffdf75a3970) failed
PMPI_Iprobe(90).: Invalid communicator
Jeff, who works for Intel but knows more about MPICH than Intel MPI
On Mon, Jan 21, 2019 at 11:19 AM Joachim Protze via discuss <
discuss at mpich.org> wrote:
> Hi all,
>
> we detected the behavior with IntelMPI 2019 (which is based on MPICH
> 3.3). Reproducing it with MPICH-3.3 was not yet successful. But I fear,
> that our built of MPICH just not uses the necessary code path / build
> flags.
>
> When calling MPI_Iprobe with the same communicator for ~2^31 times
> (which can take 10-30 minutes), the execution stops with:
>
> Abort(201962501) on node 0 (rank 0 in comm 0): Fatal error in
> PMPI_Iprobe: Invalid communicator, error stack:
> PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG,
> MPI_COMM_WORLD, flag=0x7ffd925056c0, status=0x7ffd92505694) failed
> PMPI_Iprobe(90).: Invalid communicator
>
> From my understanding of the referenced MPICH code lines, I guess, that
> the ref-count for MPI_COMM_WORLD overflows, which triggers this error
> message.
>
> Best
> Joachim
>
> --
> Dipl.-Inf. Joachim Protze
>
> IT Center
> Group: High Performance Computing
> Division: Computational Science and Engineering
> RWTH Aachen University
> Seffenter Weg 23
> D 52074 Aachen (Germany)
> Tel: +49 241 80- 24765
> Fax: +49 241 80-624765
> protze at itc.rwth-aachen.de
> www.itc.rwth-aachen.de
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190121/9e4af0f0/attachment.html>
More information about the discuss
mailing list