<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div>Hello all,</div><div><br></div><div>In 2013 I had problems regarding the allocation of unexpected messages in MPI.</div><div>After your kind assistance, I implemented a "buffer" matrix in the receiver process, using MPI_IRECV, MPI_WAITANY and MPI_TESTANY functions (the code snippet is attached).</div><div><br></div><div>It has been working nicely since than until recently, when I faced the same problems again:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="font-size:12.8000001907349px"><font face="monospace, monospace">Fatal error in MPI_Recv: Other MPI error, error stack:</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPI_Recv(186)......................: MPI_Recv(buf=0x7fffe8dd5974, count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0xd213d0) failed</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPIDI_CH3I_Progress(402)...........: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPID_nem_mpich2_blocking_recv(905).: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPID_nem_tcp_connpoll(1838)........: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">state_commrdy_handler(1676)........: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPID_nem_tcp_recv_handler(1564)....: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPID_nem_handle_pkt(636)...........: </font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an unexpected message. 261895 unexpected messages queued.</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">Fatal error in MPI_Recv: Other MPI error, error stack:</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPI_Recv(186).............: MPI_Recv(buf=0x7fffd052b9f4, count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0xd213d0) failed</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">dequeue_and_set_error(596): Communication error with rank 0</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">Fatal error in MPI_Recv: Other MPI error, error stack:</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPI_Recv(186).............: MPI_Recv(buf=0x7fff58fe5b74, count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0xd213d0) failed</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">dequeue_and_set_error(596): Communication error with rank 0</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">Fatal error in MPI_Recv: Other MPI error, error stack:</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPI_Recv(186).............: MPI_Recv(buf=0x7fff6fae19f4, count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0xd213d0) failed</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">dequeue_and_set_error(596): Communication error with rank 0</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">Fatal error in MPI_Recv: Other MPI error, error stack:</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">MPI_Recv(186).............: MPI_Recv(buf=0x7fff55bc8e74, count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0xd213d0) failed</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace">dequeue_and_set_error(596): Communication error with rank 0</font></div></blockquote><div><br></div><div><span style="font-size:12.8000001907349px">I'm using the MPICH2 1.4.1p1 on a Linux x64 machine (AWS EC2 instance). The last execution with this error had 63 working processes sending all the output to just one receiver/writer process.</span><br></div><div><span style="font-size:12.8000001907349px"><br></span></div><div>The program and number of messages sent/received are pretty much the same. The only thing I can imagine is that, probably, the processor is proportionally faster than the network/IO speed today when compared to 2013 AWS EC2 instance. In this way, probably the writer process gets "flooded" with messages earlier. Does it make sense?</div><div><br></div><div>Could you please give some advice on how to solve this issue?<br></div><div><br></div>Best regards,<div>Luiz</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 13 March 2014 at 16:01, Luiz Carlos da Costa Junior <span dir="ltr"><<a href="mailto:lcjunior@ufrj.br" target="_blank">lcjunior@ufrj.br</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks again Kenneth, I could solve using MPI_TESTANY.<div>Regards, Luiz</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On 13 March 2014 15:35, Kenneth Raffenetti <span dir="ltr"><<a href="mailto:raffenet@mcs.anl.gov" target="_blank">raffenet@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On 03/13/2014 12:35 PM, Luiz Carlos da Costa Junior wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
Does anyone have any clue about this?<br>
<br>
Thanks in advance.<br>
<br>
<br>
On 12 March 2014 14:40, Luiz Carlos da Costa Junior <<a href="mailto:lcjunior@ufrj.br" target="_blank">lcjunior@ufrj.br</a><br></div><div>
<mailto:<a href="mailto:lcjunior@ufrj.br" target="_blank">lcjunior@ufrj.br</a>>> wrote:<br>
<br>
Dear Kenneth,<br>
<br>
Thanks for your quick reply.<br>
I tested your suggestion and, unfortunately, this approach didn't work.<br>
<br>
Question: when I call MPI_IPROBE it accounts also for the messages<br>
that were already received asynchronously?<br>
</div></blockquote>
<br>
That should not be the case. If a message has been matched by a recv/irecv, MPI_Probe should not match it again.<div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Is there any way to know, for my list of mpi_requests (from my<br>
MPI_IRECV's), which ones are "opened" and which ones have messages?<br>
</blockquote>
<br></div>
MPI_Test will take a request as an argument and tell you whether or not that requested operation has been completed.<br>
<br>
Ken<div><div><br>
______________________________<u></u>_________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/<u></u>mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>