[mpich-discuss] ./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
Joni-Pekka Kurronen
joni.kurronen at gmail.com
Tue Aug 27 06:46:27 CDT 2013
I have:
-Ubuntu 12.4
-rsh-redo-rsh
-three machines
-mpich3
-have tried export HYDRA_DEMUX=select / poll
-have tried ssh/rsh
-have added to LIBS: event_core event_pthreads
I can run test at on to two machines whitout error but
when I take third machine to cluster demux engine goes mad,...
there is connection hanging,... and nothing happens,...
<MPITEST>
<NAME>uoplong</NAME>
<NP>11</NP>
<WORKDIR>./coll</WORKDIR>
<STATUS>fail</STATUS>
<TESTDIFF>
[mpiexec at mpi1] APPLICATION TIMED OUT
[proxy:0:0 at mpi1] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0 at mpi1] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at mpi1] main (./pm/pmiserv/pmip.c:206): demux engine error
waiting for event
[mpiexec at mpi1] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
terminated badly; aborting
[mpiexec at mpi1] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at mpi1] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:188): launcher returned error waiting for
completion
[mpiexec at mpi1] main (./ui/mpich/mpiexec.c:331): process manager error
waiting for completion
</TESTDIFF>
</MPITEST>
Also I can run
joni at mpi1:/mpi3/S3/hpcc-1.4.2$ mpiexec -np 6 hostname
mpi1
mpi1
ugh
ugh
kaak
kaak
but if I run
joni at mpi1:/mpi3/S3/hpcc-1.4.2$ mpiexec -np 6 ls
I get only one directory as output and
system will cease until I have re-started slave machines !
--
Joni-Pekka Kurronen
AMRINA, Insinööri AMK veneala
More information about the discuss
mailing list