[mpich-discuss] Failed to allocate memory for an unexpected message
Halim Amer
aamer at anl.gov
Thu Jul 2 16:09:51 CDT 2015
Hi Luiz,
Please use the latest MPICH. The one you are using it very old.
--Halim
Abdelhalim Amer (Halim)
Postdoctoral Appointee
MCS Division
Argonne National Laboratory
On 7/2/15 1:22 PM, Luiz Carlos da Costa Junior wrote:
> Hello all,
>
> In 2013 I had problems regarding the allocation of unexpected messages
> in MPI.
> After your kind assistance, I implemented a "buffer" matrix in the
> receiver process, using MPI_IRECV, MPI_WAITANY and MPI_TESTANY functions
> (the code snippet is attached).
>
> It has been working nicely since than until recently, when I faced the
> same problems again:
>
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186)......................: MPI_Recv(buf=0x7fffe8dd5974,
> count=1, MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> status=0xd213d0) failed
> MPIDI_CH3I_Progress(402)...........:
> MPID_nem_mpich2_blocking_recv(905).:
> MPID_nem_tcp_connpoll(1838)........:
> state_commrdy_handler(1676)........:
> MPID_nem_tcp_recv_handler(1564)....:
> MPID_nem_handle_pkt(636)...........:
> MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for
> an unexpected message. 261895 unexpected messages queued.
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............: MPI_Recv(buf=0x7fffd052b9f4, count=1,
> MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> status=0xd213d0) failed
> dequeue_and_set_error(596): Communication error with rank 0
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............: MPI_Recv(buf=0x7fff58fe5b74, count=1,
> MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> status=0xd213d0) failed
> dequeue_and_set_error(596): Communication error with rank 0
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............: MPI_Recv(buf=0x7fff6fae19f4, count=1,
> MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> status=0xd213d0) failed
> dequeue_and_set_error(596): Communication error with rank 0
> Fatal error in MPI_Recv: Other MPI error, error stack:
> MPI_Recv(186).............: MPI_Recv(buf=0x7fff55bc8e74, count=1,
> MPI_INTEGER, src=0, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> status=0xd213d0) failed
> dequeue_and_set_error(596): Communication error with rank 0
>
>
> I'm using the MPICH2 1.4.1p1 on a Linux x64 machine (AWS EC2 instance).
> The last execution with this error had 63 working processes sending all
> the output to just one receiver/writer process.
>
> The program and number of messages sent/received are pretty much the
> same. The only thing I can imagine is that, probably, the processor is
> proportionally faster than the network/IO speed today when compared to
> 2013 AWS EC2 instance. In this way, probably the writer process gets
> "flooded" with messages earlier. Does it make sense?
>
> Could you please give some advice on how to solve this issue?
>
> Best regards,
> Luiz
>
> On 13 March 2014 at 16:01, Luiz Carlos da Costa Junior <lcjunior at ufrj.br
> <mailto:lcjunior at ufrj.br>> wrote:
>
> Thanks again Kenneth, I could solve using MPI_TESTANY.
> Regards, Luiz
>
>
> On 13 March 2014 15:35, Kenneth Raffenetti <raffenet at mcs.anl.gov
> <mailto:raffenet at mcs.anl.gov>> wrote:
>
> On 03/13/2014 12:35 PM, Luiz Carlos da Costa Junior wrote:
>
> Does anyone have any clue about this?
>
> Thanks in advance.
>
>
> On 12 March 2014 14:40, Luiz Carlos da Costa Junior
> <lcjunior at ufrj.br <mailto:lcjunior at ufrj.br>
> <mailto:lcjunior at ufrj.br <mailto:lcjunior at ufrj.br>>> wrote:
>
> Dear Kenneth,
>
> Thanks for your quick reply.
> I tested your suggestion and, unfortunately, this
> approach didn't work.
>
> Question: when I call MPI_IPROBE it accounts also for
> the messages
> that were already received asynchronously?
>
>
> That should not be the case. If a message has been matched by a
> recv/irecv, MPI_Probe should not match it again.
>
>
>
> Is there any way to know, for my list of mpi_requests
> (from my
> MPI_IRECV's), which ones are "opened" and which ones
> have messages?
>
>
> MPI_Test will take a request as an argument and tell you whether
> or not that requested operation has been completed.
>
> Ken
>
> _________________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/__mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list