<div dir="ltr">This was fixed in 2019 update 2.<div><br></div><div>Jeff</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Apr 26, 2019 at 8:09 AM Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">For anyone who cares about this bug because of Intel MPI, I am told it is fixed in Intel MPI 2019 update 3.<div><br></div><div>Jeff</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 21, 2019 at 7:46 PM Jeff Hammond <<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>I was able to reproduce with my own test (<a href="https://github.com/jeffhammond/HPCInfo/blob/master/mpi/bugs/iprobe-overflow.c" target="_blank">https://github.com/jeffhammond/HPCInfo/blob/master/mpi/bugs/iprobe-overflow.c</a>) with Intel MPI 2019, so I will report that bug to the Intel MPI team.  It should be easy enough for them to figure out if this bug is from MPICH or not.<br></div><div><font color="#000000" style="background-color:rgb(255,255,255)"><br></font></div><div>





<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2139000000 iterations, 627.143272 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2140000000 iterations, 627.436206 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2141000000 iterations, 627.729135 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2142000000 iterations, 628.022049 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2143000000 iterations, 628.315015 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2144000000 iterations, 628.608066 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2145000000 iterations, 628.901065 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2146000000 iterations, 629.193992 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">2147000000 iterations, 629.488107 seconds</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">Abort(738833413) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Iprobe: Invalid communicator, error stack:</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD, flag=0x7ffdf75a396c, status=0x7ffdf75a3970) failed</font></span></p>
<p style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:13px;line-height:normal;font-family:"Andale Mono""><span style="font-variant-ligatures:no-common-ligatures;background-color:rgb(255,255,255)"><font color="#000000">PMPI_Iprobe(90).: Invalid communicator</font></span></p></div><div><font color="#000000" style="background-color:rgb(255,255,255)"><br></font></div><div><font color="#000000" style="background-color:rgb(255,255,255)">Jeff, who works for Intel but knows more about MPICH than Intel MPI</font></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Jan 21, 2019 at 11:19 AM Joachim Protze via discuss <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
we detected the behavior with IntelMPI 2019 (which is based on MPICH <br>
3.3). Reproducing it with MPICH-3.3 was not yet successful. But I fear, <br>
that our built of MPICH just not uses the necessary code path / build flags.<br>
<br>
When calling MPI_Iprobe with the same communicator for ~2^31 times <br>
(which can take 10-30 minutes), the execution stops with:<br>
<br>
Abort(201962501) on node 0 (rank 0 in comm 0): Fatal error in <br>
PMPI_Iprobe: Invalid communicator, error stack:<br>
PMPI_Iprobe(123): MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, <br>
MPI_COMM_WORLD, flag=0x7ffd925056c0, status=0x7ffd92505694) failed<br>
PMPI_Iprobe(90).: Invalid communicator<br>
<br>
 From my understanding of the referenced MPICH code lines, I guess, that <br>
the ref-count for MPI_COMM_WORLD overflows, which triggers this error <br>
message.<br>
<br>
Best<br>
Joachim<br>
<br>
-- <br>
Dipl.-Inf. Joachim Protze<br>
<br>
IT Center<br>
Group: High Performance Computing<br>
Division: Computational Science and Engineering<br>
RWTH Aachen University<br>
Seffenter Weg 23<br>
D 52074  Aachen (Germany)<br>
Tel: +49 241 80- 24765<br>
Fax: +49 241 80-624765<br>
<a href="mailto:protze@itc.rwth-aachen.de" target="_blank">protze@itc.rwth-aachen.de</a><br>
<a href="http://www.itc.rwth-aachen.de" rel="noreferrer" target="_blank">www.itc.rwth-aachen.de</a><br>
<br>
_______________________________________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>