[mpich-discuss] mpiexec crash
Jain, Rohit
Rohit_Jain at mentor.com
Mon Oct 28 19:48:35 CDT 2013
Pavan,
We retried the runs again. There is no ENOENT error now. But, MPI is still failing consistently with same error:
> [proxy:0:0 at gretel] HYD_pmcd_pmip_control_cmd_cb (</PATH/TO>/src/pm/hydra/pm/pmiserv/pmip_cb.c:934): assert (!closed) failed
> [proxy:0:0 at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec at gretel] control_cb (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
> [mpiexec at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at gretel] HYD_pmci_wait_for_completion (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
> [mpiexec at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion
We are running it on same machine as:
mpiexec -n 1 <exec> : -n 1 <exec> :.....
What would cause such error to appear? How do we debug such issues?
Regards,
Rohit
-----Original Message-----
From: discuss-bounces at mpich.org [mailto:discuss-bounces at mpich.org] On Behalf Of Pavan Balaji
Sent: Wednesday, October 23, 2013 7:22 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] mpiexec crash
On Oct 23, 2013, at 5:29 PM, Cherukumilli, Vasu <Vasu_Cherukumilli at mentor.com> wrote:
> Crash that we are seeing:
>
> [proxy:0:0 at gretel] HYD_pmcd_pmip_control_cmd_cb (</PATH/TO>/src/pm/hydra/pm/pmiserv/pmip_cb.c:934): assert (!closed) failed
> [proxy:0:0 at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmip.c:210): demux engine error waiting for event
> [mpiexec at gretel] control_cb (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
> [mpiexec at gretel] HYDT_dmxu_poll_wait_for_event (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at gretel] HYD_pmci_wait_for_completion (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:196): error waiting for event
> [mpiexec at gretel] main (</PATH/TO>/src/mpich2-1.5/src/pm/hydra/ui/mpich/mpiexec.c:325): process manager error waiting for completion
These are cleanup messages. You should have gotten an output which says so.
> No such file or directory. (errno = ENOENT)
This is the real error message. Did you make sure your executables are located on all the nodes in the same location?
-- Pavan
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list