[mpich-discuss] Error: failed to allocate memory for an unexpected message

Wesley Bland wbland at anl.gov
Sun Sep 28 07:44:20 CDT 2014


The problem in this situation usually is that you're not posting enough receives and too many of your messages are getting buffered by the MPI library. Make sure you match up your sends and receives and whenever possible you post your receives early. 

Wesley



> On Sep 28, 2014, at 7:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
> 
> Hi all,
> 
> I am running a MPI program on two machines. I got errors as follows:
> 
> 
> ====================================================================
> Fatal error in MPI_Test: Other MPI error, error stack:
> MPI_Test(153)......................: MPI_Test(request=0xa0a088, flag=0x7fff470e86fc,  status=0x7fff470e86e0) failed
> MPIDI_CH3I_Progress(150)...........: 
> MPID_nem_mpich2_test_recv(800).....: 
> MPID_nem_tcp_connpoll(1720)........: 
> state_commrdy_handler(1556)........: 
> MPID_nem_tcp_recv_handler(1459)....: 
> MPID_nem_handle_pkt(493)...........: 
> MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an unexpected message. 261892 unexpected messages queued.
> Fatal error in MPI_Test: Other MPI error, error stack:
> MPI_Test(153)......................: MPI_Test(request=0xadb128, flag=0x7fff33cba448, status=0x7fff33cba430) failed
> MPIDI_CH3I_Progress(150)...........: 
> MPID_nem_mpich2_test_recv(800).....: 
> MPID_nem_tcp_connpoll(1720)........: 
> state_commrdy_handler(1556)........: 
> MPID_nem_tcp_recv_handler(1459)....: 
> MPID_nem_handle_pkt(493)...........: 
> MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an unexpected message. 261890 unexpected messages queued.
> rank 1 in job 11  slave_36134   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9 
> 
> ====================================================================
> 
> 
> I have never seen such errors before. What is the cause of this error? Is it an out of memory error? ( There is 20% remaining memory on machines )
> 
> Any help would be greatly appreciated. Thanks in advance!
> 
> 
> -- 
> Best Regards.
> ---
> Xing FENG
> PhD Candidate
> Database Research Group
> 
> School of Computer Science and Engineering
> University of New South Wales
> NSW 2052, Sydney
> 
> Phone: (+61) 413 857 288
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list