[mpich-discuss] ./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed

Joni-Pekka Kurronen joni.kurronen at gmail.com
Tue Aug 27 06:46:27 CDT 2013


I have:
-Ubuntu 12.4
-rsh-redo-rsh
-three machines
-mpich3
-have tried export HYDRA_DEMUX=select / poll
-have tried ssh/rsh
-have added to LIBS: event_core event_pthreads

I can run test at on to two machines whitout error but
when I take third machine to cluster demux engine goes mad,...
  there is connection hanging,... and nothing happens,...


<MPITEST>
<NAME>uoplong</NAME>
<NP>11</NP>
<WORKDIR>./coll</WORKDIR>
<STATUS>fail</STATUS>
<TESTDIFF>
[mpiexec at mpi1] APPLICATION TIMED OUT
[proxy:0:0 at mpi1] HYD_pmcd_pmip_control_cmd_cb 
(./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0 at mpi1] HYDT_dmxu_poll_wait_for_event 
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at mpi1] main (./pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[mpiexec at mpi1] HYDT_bscu_wait_for_completion 
(./tools/bootstrap/utils/bscu_wait.c:76): one of the processes 
terminated badly; aborting
[mpiexec at mpi1] HYDT_bsci_wait_for_completion 
(./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting 
for completion
[mpiexec at mpi1] HYD_pmci_wait_for_completion 
(./pm/pmiserv/pmiserv_pmci.c:188): launcher returned error waiting for 
completion
[mpiexec at mpi1] main (./ui/mpich/mpiexec.c:331): process manager error 
waiting for completion
</TESTDIFF>
</MPITEST>

Also I can run
joni at mpi1:/mpi3/S3/hpcc-1.4.2$ mpiexec -np 6 hostname
mpi1
mpi1
ugh
ugh
kaak
kaak

but if I run
joni at mpi1:/mpi3/S3/hpcc-1.4.2$ mpiexec -np 6 ls
I get only one directory as output and
system will cease until I have re-started slave machines !




-- 

Joni-Pekka Kurronen
AMRINA, Insinööri AMK veneala




More information about the discuss mailing list