[mpich-discuss] Error: failed to allocate memory for an unexpected message
XingFENG
xingfeng at cse.unsw.edu.au
Thu Oct 2 05:57:29 CDT 2014
Hi Wesley Bland,
Thanks for your reply.
My codes is relatively big( around 2000 lines). I will try to make and post
one small example later.
On Thu, Oct 2, 2014 at 8:50 PM, Wesley Bland <wbland at anl.gov> wrote:
> Can you provide a minimal example code that reproduced the problem?
>
>
>
> On Oct 2, 2014, at 2:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
>
> Hi Wesley Bland,
>
> Thanks for your reply.
>
> I have modified my codes. For each process, it first receives then sends
> message from/to others. However, same error still appears.
>
> I also noted that the code works fine for single node machine. It crushed
> with this error on multi-node cluster.
>
>
> On Sun, Sep 28, 2014 at 10:44 PM, Wesley Bland <wbland at anl.gov> wrote:
>
>> The problem in this situation usually is that you're not posting enough
>> receives and too many of your messages are getting buffered by the MPI
>> library. Make sure you match up your sends and receives and whenever
>> possible you post your receives early.
>>
>> Wesley
>>
>>
>>
>> > On Sep 28, 2014, at 7:13 AM, XingFENG <xingfeng at cse.unsw.edu.au> wrote:
>> >
>> > Hi all,
>> >
>> > I am running a MPI program on two machines. I got errors as follows:
>> >
>> >
>> > ====================================================================
>> > Fatal error in MPI_Test: Other MPI error, error stack:
>> > MPI_Test(153)......................: MPI_Test(request=0xa0a088,
>> flag=0x7fff470e86fc, status=0x7fff470e86e0) failed
>> > MPIDI_CH3I_Progress(150)...........:
>> > MPID_nem_mpich2_test_recv(800).....:
>> > MPID_nem_tcp_connpoll(1720)........:
>> > state_commrdy_handler(1556)........:
>> > MPID_nem_tcp_recv_handler(1459)....:
>> > MPID_nem_handle_pkt(493)...........:
>> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an
>> unexpected message. 261892 unexpected messages queued.
>> > Fatal error in MPI_Test: Other MPI error, error stack:
>> > MPI_Test(153)......................: MPI_Test(request=0xadb128,
>> flag=0x7fff33cba448, status=0x7fff33cba430) failed
>> > MPIDI_CH3I_Progress(150)...........:
>> > MPID_nem_mpich2_test_recv(800).....:
>> > MPID_nem_tcp_connpoll(1720)........:
>> > state_commrdy_handler(1556)........:
>> > MPID_nem_tcp_recv_handler(1459)....:
>> > MPID_nem_handle_pkt(493)...........:
>> > MPIDI_CH3_PktHandler_EagerSend(589): Failed to allocate memory for an
>> unexpected message. 261890 unexpected messages queued.
>> > rank 1 in job 11 slave_36134 caused collective abort of all ranks
>> > exit status of rank 1: killed by signal 9
>> >
>> > ====================================================================
>> >
>> >
>> > I have never seen such errors before. What is the cause of this error?
>> Is it an out of memory error? ( There is 20% remaining memory on machines )
>> >
>> > Any help would be greatly appreciated. Thanks in advance!
>> >
>> >
>> > --
>> > Best Regards.
>> > ---
>> > Xing FENG
>> > PhD Candidate
>> > Database Research Group
>> >
>> > School of Computer Science and Engineering
>> > University of New South Wales
>> > NSW 2052, Sydney
>> >
>> > Phone: (+61) 413 857 288
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> --
> Best Regards.
> ---
> Xing FENG
> PhD Candidate
> Database Research Group
>
> School of Computer Science and Engineering
> University of New South Wales
> NSW 2052, Sydney
>
> Phone: (+61) 413 857 288
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Best Regards.
---
Xing FENG
PhD Candidate
Database Research Group
School of Computer Science and Engineering
University of New South Wales
NSW 2052, Sydney
Phone: (+61) 413 857 288
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141002/caf434d7/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list