[mpich-discuss] Possible integer-overflow for MPI_COMM_WORLD in MPI_Iprobe

Joachim Protze protze at itc.rwth-aachen.de
Mon Jan 21 13:19:28 CST 2019


Hi all,

we detected the behavior with IntelMPI 2019 (which is based on MPICH 
3.3). Reproducing it with MPICH-3.3 was not yet successful. But I fear, 
that our built of MPICH just not uses the necessary code path / build flags.

When calling MPI_Iprobe with the same communicator for ~2^31 times 
(which can take 10-30 minutes), the execution stops with:

Abort(201962501) on node 0 (rank 0 in comm 0): Fatal error in 
PMPI_Iprobe: Invalid communicator, error stack:
PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, 
MPI_COMM_WORLD, flag=0x7ffd925056c0, status=0x7ffd92505694) failed
PMPI_Iprobe(90).: Invalid communicator

 From my understanding of the referenced MPICH code lines, I guess, that 
the ref-count for MPI_COMM_WORLD overflows, which triggers this error 
message.

Best
Joachim

-- 
Dipl.-Inf. Joachim Protze

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074  Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de
www.itc.rwth-aachen.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4915 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190121/6ee9fb32/attachment.p7s>


More information about the discuss mailing list