[mpich-discuss] Failed to allocate memory for an unexpected message

Luiz Carlos da Costa Junior lcjunior at ufrj.br
Thu Oct 31 12:14:49 CDT 2013


Dear Pavan,

Thanks for your reply.

I noticed that the numbers of 261895 or 261894 unexpected messages appear
every time.
I am using AWS machine with 60GB of RAM memory to run my application. We
could see during the execution that there was more than 30 GB available.
Is this a internal limitation of MPICH implementation? Would it be possible
to recompile MPICH setting a different parameter?
Do you know where exactly this limitation occurs?

Best regards,


On 24 October 2013 00:19, Pavan Balaji <balaji at mcs.anl.gov> wrote:

> Luiz,
>
> You don’t need to know how many messages you will receive.  You only need
> to make sure that whenever a message comes in, it knows which buffer it
> should go into.  One way to fix that is to post N receives with ANY_SOURCE
> and ANY_TAG:
>
> MPI_Irecv(.., MPI_ANY_SOURCE, MPI_ANY_TAG, ..);
>
> Then test for the first one to see if something came in.  Whenever one
> message comes in, process it and repost it as another Irecv.  In this case,
> you should not have any unexpected messages (unless you have multiple
> communicators, which is a different story).
>
>   —- Pavan
>
> On Oct 23, 2013, at 4:22 PM, Luiz Carlos da Costa Junior <lcjunior at ufrj.br>
> wrote:
>
> > Hi Antonio,
> >
> > Thanks for your quick reply.
> >
> > I confess I have to study a bit more about this, but I think I
> understood your suggestion.
> > After a little research, I understood that when I use MPI_IRecv I am
> performing a non-blocking operation to just inform that I have a message to
> receive and then after MPI_Waitany returns I can access data to write it on
> the file. In this kind of implementation, I inform that I have a bunch of
> messages to receive from worker processes and, because of that, I would
> have to have a buffer in the writer process to receive messages
> asynchronously. The higher efficiency is archived because, while the writer
> process is performing the IO operation (writing to hard disk), MPI can
> transfer the received data in the meanwhile to my pre-allocated buffer in
> my application. Is all this right?
> >
> > But, what if I don't know the number of messages I will receive?
> > That's my case, actually... I don't even know which worker process will
> send me data and I also don't know how many times each one of them will
> send me messages. I use a master-slaves scheme to distribute tasks, so the
> number of calculations done in each worker process (and thus, data send to
> writer process) depends on the speed each worker and how many times they
> ask for tasks. It is easy to deal with messages coming from a unknown
> sender, but I don't know how to deal with an unknown number of messages
> (i.e., how many MPI_IRecv I have to call?). Any idea?
> >
> > As I said before, I considered having more than one writer process, but
> I can't see how this would solve the problem, that seems to be related to
> IO disk speed. In other words, why it would be worth having more than one
> writer process if, at the end, I have only one hard disk to perform IO
> operations?
> >
> > Thanks again.
> >
> > Regards, Luiz
> >
> >
> > On 23 October 2013 17:42, Antonio J. Peña <apenya at mcs.anl.gov> wrote:
> >
> > Hi Luiz,
> >
> > Your error trace indicates that the receiver went out of memory due to a
> too large amount (261,895) of eager unexpected messages received, i.e.,
> small messages received without a matching receive operation. Whenever this
> happens, the receiver allocates a temporary buffer to hold the received
> message. This exhausted the available memory in the computer where the
> receiver was executing.
> >
> > To avoid this, try to pre-post receives before messages arrive. Indeed,
> this is far more efficient. Maybe you could do an MPI_IRecv per worker in
> your writer process, and process them after an MPI_Waitany. You may also
> consider having multiple writer processes if your use case permits and the
> volume of received messages is too high to be processed by a single writer.
> >
> > Antonio
> >
> >
> > On Wednesday, October 23, 2013 05:27:27 PM Luiz Carlos da Costa Junior
> wrote:
> > Hi,
> >
> > I am getting the following error when running my parallel application:
> >
> > MPI_Recv(186)......................: MPI_Recv(buf=0x125bd840,
> count=2060, MPI_CHARACTER, src=24, tag=94, comm=0x84000002,
> status=0x125fcff0) failed
> > MPIDI_CH3I_Progress(402)...........:
> > MPID_nem_mpich2_blocking_recv(905).:
> > MPID_nem_tcp_connpoll(1838)........:
> > state_commrdy_handler(1676)........:
> > MPID_nem_tcp_recv_handler(1564)....:
> > MPID_nem_handle_pkt(636)...........:
> > MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an
> unexpected message. 261895 unexpected messages queued.
> > Fatal error in MPI_Send: Other MPI error, error stack:
> > MPI_Send(173)..............: MPI_Send(buf=0x765d2e60, count=2060,
> MPI_CHARACTER, dest=0, tag=94, comm=0x84000004) failed
> > MPID_nem_tcp_connpoll(1826): Communication error with rank 1: Connection
> reset by peer
> >
> > I went to MPICH's FAQ (
> http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_Why_am_I_getting_so_many_unexpected_messages.3F
> ).
> > It says that most likely the receiver process can't cope to process the
> high number of messages it is receiving.
> >
> > In my application, the worker processes perform a very large number of
> small computations and, after some computation is complete, they sent the
> data to a special "writer" process that is responsible to write the output
> to disk.
> > This scheme use to work in a very reasonable fashion, until we faced
> some new data with larger parameters that caused the problem above.
> >
> > Even though we can redesign the application, for example, by creating a
> pool of writer process we still have only one hard disk, so the bottleneck
> would not be solved. So, this doesn't seem to be a good approach.
> >
> > As far as I understood, MPICH saves the content of every MPI_SEND in a
> internal buffer (I don't know where the buffer in located, sender or
> receiver?) to allow asynchronous sender's computation while the messages
> are being received.
> > The problem is that buffer has been exhausted due some resource
> limitation.
> >
> > It is very interesting to have a buffer but if the buffer in the writer
> process is close to its limit the workers processes should stop and wait
> until it frees some space to restart sending new data to be written to disk.
> >
> > Is it possible to check this buffer in MPICH? Or is it possible to check
> the number of messages to be received?
> > Can anyone suggest a better (easy to implement) solution?
> >
> > Thanks in advance.
> >
> > Regards,
> > Luiz
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20131031/c9d4613d/attachment.html>


More information about the discuss mailing list