[mpich-discuss] Error: failed to allocate memory for an unexpected message

XingFENG xingfeng at cse.unsw.edu.au
Thu Oct 2 02:13:54 CDT 2014


Hi Wesley Bland,

Thanks for your reply.

I have modified my codes. For each process, it first receives then sends
message from/to others. However, same error still appears.

I also noted that the code works fine for single node machine. It crushed
with this error on multi-node cluster.


On Sun, Sep 28, 2014 at 10:44 PM, Wesley Bland <wbland at anl.gov> wrote:

> The problem in this situation usually is that you're not posting enough
> receives and too many of your messages are getting buffered by the MPI
> library. Make sure you match up your sends and receives and whenever
> possible you post your receives early.
>
> Wesley
>
>
>
> > On Sep 28, 2014, at 7:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
> >
> > Hi all,
> >
> > I am running a MPI program on two machines. I got errors as follows:
> >
> >
> > ====================================================================
> > Fatal error in MPI_Test: Other MPI error, error stack:
> > MPI_Test(153)......................: MPI_Test(request=0xa0a088,
> flag=0x7fff470e86fc,  status=0x7fff470e86e0) failed
> > MPIDI_CH3I_Progress(150)...........:
> > MPID_nem_mpich2_test_recv(800).....:
> > MPID_nem_tcp_connpoll(1720)........:
> > state_commrdy_handler(1556)........:
> > MPID_nem_tcp_recv_handler(1459)....:
> > MPID_nem_handle_pkt(493)...........:
> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an
> unexpected message. 261892 unexpected messages queued.
> > Fatal error in MPI_Test: Other MPI error, error stack:
> > MPI_Test(153)......................: MPI_Test(request=0xadb128,
> flag=0x7fff33cba448, status=0x7fff33cba430) failed
> > MPIDI_CH3I_Progress(150)...........:
> > MPID_nem_mpich2_test_recv(800).....:
> > MPID_nem_tcp_connpoll(1720)........:
> > state_commrdy_handler(1556)........:
> > MPID_nem_tcp_recv_handler(1459)....:
> > MPID_nem_handle_pkt(493)...........:
> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an
> unexpected message. 261890 unexpected messages queued.
> > rank 1 in job 11  slave_36134   caused collective abort of all ranks
> >   exit status of rank 1: killed by signal 9
> >
> > ====================================================================
> >
> >
> > I have never seen such errors before. What is the cause of this error?
> Is it an out of memory error? ( There is 20% remaining memory on machines )
> >
> > Any help would be greatly appreciated. Thanks in advance!
> >
> >
> > --
> > Best Regards.
> > ---
> > Xing FENG
> > PhD Candidate
> > Database Research Group
> >
> > School of Computer Science and Engineering
> > University of New South Wales
> > NSW 2052, Sydney
> >
> > Phone: (+61) 413 857 288
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Best Regards.
---
Xing FENG
PhD Candidate
Database Research Group

School of Computer Science and Engineering
University of New South Wales
NSW 2052, Sydney

Phone: (+61) 413 857 288
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141002/35b19715/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list