[mpich-discuss] mpich hangs
Pavan Balaji
balaji at mcs.anl.gov
Thu Jun 27 22:24:11 CDT 2013
Looks like your application aborted for some reason.
-- Pavan
On 06/27/2013 10:21 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
> My bad, I just found out that there was a duplicate entry like:
> weiser1 127.0.1.1
> weiser1 192.168.0.101
> so i removed teh 127.x.x.x. entry and kept the hostfile contents similar
> on both nodes. Now previous error is reduced to this one:
>
> ------ START OF OUTPUT -------
>
> ....some HPL startup string (no final result)
> ...skip.....
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 9
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> [proxy:0:0 at weiser1] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
> [proxy:0:0 at weiser1] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at weiser1] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at weiser1] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
> terminated badly; aborting
> [mpiexec at weiser1] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
> for completion
> [mpiexec at weiser1] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
> completion
> [mpiexec at weiser1] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
> ------ END OF OUTPUT -------
>
>
>
> On Fri, Jun 28, 2013 at 12:12 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
> On 06/27/2013 10:08 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>
> P4-businesscard=description#__weiser2$port#57651$ifname#192.__168.0.102$
> P5-businesscard=description#__weiser2$port#52622$ifname#192.__168.0.102$
> P6-businesscard=description#__weiser2$port#55935$ifname#192.__168.0.102$
> P7-businesscard=description#__weiser2$port#54952$ifname#192.__168.0.102$
> P0-businesscard=description#__weiser1$port#41958$ifname#127.__0.1.1$
> P2-businesscard=description#__weiser1$port#35049$ifname#127.__0.1.1$
> P1-businesscard=description#__weiser1$port#39634$ifname#127.__0.1.1$
> P3-businesscard=description#__weiser1$port#51802$ifname#127.__0.1.1$
>
>
> I have two concerns with your output. Let's start with the first.
>
> Did you look at this question on the FAQ page?
>
> "Is your /etc/hosts file consistent across all nodes? Unless you are
> using an external DNS server, the /etc/hosts file on every machine
> should contain the correct IP information about all hosts in the
> system."
>
>
> -- Pavan
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list