[mpich-discuss] mpich hangs

Pavan Balaji balaji at mcs.anl.gov
Thu Jun 27 22:24:11 CDT 2013


Looks like your application aborted for some reason.

  -- Pavan

On 06/27/2013 10:21 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
> My bad, I just found out that there was a duplicate entry like:
> weiser1 127.0.1.1
> weiser1 192.168.0.101
> so i removed teh 127.x.x.x. entry and kept the hostfile contents similar
> on both nodes. Now previous error is reduced to this one:
>
> ------ START OF OUTPUT -------
>
> ....some HPL startup string (no final result)
> ...skip.....
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 9
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:0:0 at weiser1] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:0 at weiser1] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at weiser1] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at weiser1] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
> terminated badly; aborting
> [mpiexec at weiser1] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
> for completion
> [mpiexec at weiser1] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
> completion
> [mpiexec at weiser1] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
> ------ END OF OUTPUT -------
>
>
>
> On Fri, Jun 28, 2013 at 12:12 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
>     On 06/27/2013 10:08 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>
>         P4-businesscard=description#__weiser2$port#57651$ifname#192.__168.0.102$
>         P5-businesscard=description#__weiser2$port#52622$ifname#192.__168.0.102$
>         P6-businesscard=description#__weiser2$port#55935$ifname#192.__168.0.102$
>         P7-businesscard=description#__weiser2$port#54952$ifname#192.__168.0.102$
>         P0-businesscard=description#__weiser1$port#41958$ifname#127.__0.1.1$
>         P2-businesscard=description#__weiser1$port#35049$ifname#127.__0.1.1$
>         P1-businesscard=description#__weiser1$port#39634$ifname#127.__0.1.1$
>         P3-businesscard=description#__weiser1$port#51802$ifname#127.__0.1.1$
>
>
>     I have two concerns with your output.  Let's start with the first.
>
>     Did you look at this question on the FAQ page?
>
>     "Is your /etc/hosts file consistent across all nodes? Unless you are
>     using an external DNS server, the /etc/hosts file on every machine
>     should contain the correct IP information about all hosts in the
>     system."
>
>
>       -- Pavan
>
>     --
>     Pavan Balaji
>     http://www.mcs.anl.gov/~balaji
>
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list