[mpich-discuss] Possible integer-overflow for MPI_COMM_WORLD in MPI_Iprobe
Joachim Protze
protze at itc.rwth-aachen.de
Mon Jan 21 13:19:28 CST 2019
Hi all,
we detected the behavior with IntelMPI 2019 (which is based on MPICH
3.3). Reproducing it with MPICH-3.3 was not yet successful. But I fear,
that our built of MPICH just not uses the necessary code path / build flags.
When calling MPI_Iprobe with the same communicator for ~2^31 times
(which can take 10-30 minutes), the execution stops with:
Abort(201962501) on node 0 (rank 0 in comm 0): Fatal error in
PMPI_Iprobe: Invalid communicator, error stack:
PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG,
MPI_COMM_WORLD, flag=0x7ffd925056c0, status=0x7ffd92505694) failed
PMPI_Iprobe(90).: Invalid communicator
From my understanding of the referenced MPICH code lines, I guess, that
the ref-count for MPI_COMM_WORLD overflows, which triggers this error
message.
Best
Joachim
--
Dipl.-Inf. Joachim Protze
IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074 Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de
www.itc.rwth-aachen.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4915 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190121/6ee9fb32/attachment.p7s>
More information about the discuss
mailing list