[mpich-discuss] Error: failed to allocate memory for an unexpected message

Wesley Bland wbland at anl.gov
Thu Oct 2 05:50:09 CDT 2014


Can you provide a minimal example code that reproduced the problem?



> On Oct 2, 2014, at 2:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
> 
> Hi Wesley Bland,
> 
> Thanks for your reply.
> 
> I have modified my codes. For each process, it first receives then sends message from/to others. However, same error still appears. 
> 
> I also noted that the code works fine for single node machine. It crushed with this error on multi-node cluster.
> 
> 
>> On Sun, Sep 28, 2014 at 10:44 PM, Wesley Bland <wbland at anl.gov> wrote:
>> The problem in this situation usually is that you're not posting enough receives and too many of your messages are getting buffered by the MPI library. Make sure you match up your sends and receives and whenever possible you post your receives early.
>> 
>> Wesley
>> 
>> 
>> 
>> > On Sep 28, 2014, at 7:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
>> >
>> > Hi all,
>> >
>> > I am running a MPI program on two machines. I got errors as follows:
>> >
>> >
>> > ====================================================================
>> > Fatal error in MPI_Test: Other MPI error, error stack:
>> > MPI_Test(153)......................: MPI_Test(request=0xa0a088, flag=0x7fff470e86fc,  status=0x7fff470e86e0) failed
>> > MPIDI_CH3I_Progress(150)...........:
>> > MPID_nem_mpich2_test_recv(800).....:
>> > MPID_nem_tcp_connpoll(1720)........:
>> > state_commrdy_handler(1556)........:
>> > MPID_nem_tcp_recv_handler(1459)....:
>> > MPID_nem_handle_pkt(493)...........:
>> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an unexpected message. 261892 unexpected messages queued.
>> > Fatal error in MPI_Test: Other MPI error, error stack:
>> > MPI_Test(153)......................: MPI_Test(request=0xadb128, flag=0x7fff33cba448, status=0x7fff33cba430) failed
>> > MPIDI_CH3I_Progress(150)...........:
>> > MPID_nem_mpich2_test_recv(800).....:
>> > MPID_nem_tcp_connpoll(1720)........:
>> > state_commrdy_handler(1556)........:
>> > MPID_nem_tcp_recv_handler(1459)....:
>> > MPID_nem_handle_pkt(493)...........:
>> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an unexpected message. 261890 unexpected messages queued.
>> > rank 1 in job 11  slave_36134   caused collective abort of all ranks
>> >   exit status of rank 1: killed by signal 9
>> >
>> > ====================================================================
>> >
>> >
>> > I have never seen such errors before. What is the cause of this error? Is it an out of memory error? ( There is 20% remaining memory on machines )
>> >
>> > Any help would be greatly appreciated. Thanks in advance!
>> >
>> >
>> > --
>> > Best Regards.
>> > ---
>> > Xing FENG
>> > PhD Candidate
>> > Database Research Group
>> >
>> > School of Computer Science and Engineering
>> > University of New South Wales
>> > NSW 2052, Sydney
>> >
>> > Phone: (+61) 413 857 288
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> -- 
> Best Regards.
> ---
> Xing FENG
> PhD Candidate
> Database Research Group
> 
> School of Computer Science and Engineering
> University of New South Wales
> NSW 2052, Sydney
> 
> Phone: (+61) 413 857 288
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141002/c5da9fa0/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list