[mpich-discuss] Error: assert (!closed) failed
Pavan Balaji
balaji at mcs.anl.gov
Sun Dec 23 21:09:54 CST 2012
One of the proxy processes died suddenly causing everything else to fall
apart. Try using the latest version of hydra (3.0.1), and also try
running the proxy processes with a debugger (set the environment
variable HYDRA_USE_DDD=1).
-- Pavan
On 08/20/2012 03:46 AM US Central Time, Yann RADENAC wrote:
> Hi,
>
> I'm developing MPI support for XtreemOS (www.xtreemos.eu) so that an MPI
> program is managed as a single XtreemOS job.
> To manage all processes as a single XtreemOS job, I've developed the
> program xos-createProcess that plays the role of the launcher (replacing
> ssh/rsh) to start a process on a remote machine that is part of the ones
> reserved for the current job.
>
> I'm running a simple hello world MPI program where each processes sends
> a string to the process 0 that itself prints them on standard output.
>
> When using MPICH2 with ssh, this program works perfectly on several
> machines.
>
> When using MPICH2 with my launcher xos-createProcess, it works with an
> MPI program of 2 processes on 2 different machines.
>
> However I cannot pass through the following error that happens when
> running an MPI program of 3 processes on 3 different machines (or any n
> processes on n different machines with n >= 3). Everything terminates
> almost immediately with these error messages:
>
>
> Process 0 ends with error code 7 and its standard error output is :
>
> [mpiexec at paradent-2.rennes.grid5000.fr] cmd_response
> (./pm/pmiserv/pmiserv_pmi_v1.c:29): assert (!closed) failed
> [mpiexec at paradent-2.rennes.grid5000.fr] fn_barrier_in
> (./pm/pmiserv/pmiserv_pmi_v1.c:70): error writing PMI line
> [mpiexec at paradent-2.rennes.grid5000.fr] handle_pmi_cmd
> (./pm/pmiserv/pmiserv_cb.c:44): PMI handler returned error
> [mpiexec at paradent-2.rennes.grid5000.fr] control_cb
> (./pm/pmiserv/pmiserv_cb.c:289): unable to process PMI command
> [mpiexec at paradent-2.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at paradent-2.rennes.grid5000.fr] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> [mpiexec at paradent-2.rennes.grid5000.fr] main (./ui/mpich/mpiexec.c:405):
> process manager error waiting for completion
>
>
> On *only* one of the other processes, the standard error output is:
>
> [proxy:0:0 at paradent-1.rennes.grid5000.fr] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
> [proxy:0:0 at paradent-1.rennes.grid5000.fr] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at paradent-1.rennes.grid5000.fr] main
> (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
>
>
>
> The run command is:
>
> -bash -c '(mpiexec -launcher-exec /usr/bin/xos-createProcess -np 3
> -host `xreservation -a $XOS_RSVID` ./mpi/hello_world_MPI < /dev/null >
> mpiexec.out) >& mpiexec.err'
>
>
>
> Can anyone explain me what this error means ?
>
> I'm using MPICH2 1.4.1p1
>
>
> Thanks for your help.
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list