[mpich-discuss] mpiexec crash

Jain, Rohit Rohit_Jain at mentor.com
Mon Oct 28 19:48:35 CDT 2013


Pavan,

We retried the runs again. There is no ENOENT error now. But, MPI is still failing consistently with same error:

> [proxy:0:0 at gretel] HYD_pmcd_pmip_control_cmd_cb (</PATH/TO>/src/pm/hydra/pm/pmiserv/pmip_cb.c:934): assert (!closed) failed
> [proxy:0:0 at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec at gretel] control_cb (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
> [mpiexec at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at gretel] HYD_pmci_wait_for_completion (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
> [mpiexec at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion

We are running it on same machine as:
	mpiexec -n 1 <exec> : -n 1 <exec> :.....

What would cause such error to appear? How do we debug such issues?

Regards,
Rohit


-----Original Message-----
From: discuss-bounces at mpich.org [mailto:discuss-bounces at mpich.org] On Behalf Of Pavan Balaji
Sent: Wednesday, October 23, 2013 7:22 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] mpiexec crash


On Oct 23, 2013, at 5:29 PM, Cherukumilli, Vasu <Vasu_Cherukumilli at mentor.com> wrote:
> Crash that we are seeing:
>  
> [proxy:0:0 at gretel] HYD_pmcd_pmip_control_cmd_cb (</PATH/TO>/src/pm/hydra/pm/pmiserv/pmip_cb.c:934): assert (!closed) failed
> [proxy:0:0 at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec at gretel] control_cb (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
> [mpiexec at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at gretel] HYD_pmci_wait_for_completion (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
> [mpiexec at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion

These are cleanup messages.  You should have gotten an output which says so.
 
> No such file or directory. (errno = ENOENT)

This is the real error message.  Did you make sure your executables are located on all the nodes in the same location?

  -- Pavan

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list