[mpich-discuss] Unexpected "Bad termination"

Halim Amer aamer at anl.gov
Fri Jan 29 10:42:55 CST 2016


Luiz,

You are experiencing a segmentation fault.
We don't have enough information to pinpoint the source of the problem, 
however. We usually require a small piece of code that reproduces the 
bug to help debugging.

--Halim

www.mcs.anl.gov/~aamer

On 1/29/16 10:27 AM, Luiz Carlos da Costa Junior wrote:
> Dear all,
>
> We have been using MPICH with our software and performing execution in
> Amazon AWS Linux servers for a long time.
> We use to have in production environment MPICH version 1.4.1p1 (which -
> I know - is very old), but it has been very very stable in the latest years.
> However, recently we have been facing a "Bad termination" problem once
> in a while, so we decided to investigate this issue.
> In principle, we don't have a apparent reason to believe that the
> problem lies on our code, since there was no changes that explain this
> behavior.
> The other point is that it occurs in a intermittent fashion, if we run
> the program again it doesn't happen, so it has been difficult to
> debug/trace the source of the problem.
>
> Our first step, then, was to update the MPI version to the latest
> version 3.2.
> However, we faced the same problem (output below):
>
>     =====================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   EXIT CODE: 11
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     =====================================================================================
>     [proxy:0:0 at ip-10-137-129-86] HYD_pmcd_pmip_control_cmd_cb
>     (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
>     [proxy:0:0 at ip-10-137-129-86] HYDT_dmxu_poll_wait_for_event
>     (./tools/demux/demux_poll.c:77): callback returned error status
>     [mpiexec at ip-10-137-129-86] HYDT_bscu_wait_for_completion
>     (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes
>     terminated badly; aborting
>     [mpiexec at ip-10-137-129-86] HYDT_bsci_wait_for_completion
>     (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>     waiting for completion
>     [mpiexec at ip-10-137-129-86] HYD_pmci_wait_for_completion
>     (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting
>     for completion
>     [mpiexec at ip-10-137-129-86] main (./ui/mpich/mpiexec.c:405): process
>     manager error waiting for completion
>
>
> Do you have any clue about what might have been causing this problem?
> Any suggestion at this point would be highly appreciated.
>
> Best regards,
> Luiz
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list