[mpich-discuss] weird behavior with mpiexe (3.0.4)

Pavan Balaji balaji at mcs.anl.gov
Sat Jun 1 17:12:03 CDT 2013


We certainly want to give better error messages where possible.

I've created a ticket for it:

https://trac.mpich.org/projects/mpich/ticket/1872

  -- Pavan

On 05/29/2013 04:37 PM, Edscott Wilson wrote:
>
>
>
> 2013/5/29 Jeff Hammond <jhammond at alcf.anl.gov
> <mailto:jhammond at alcf.anl.gov>>
>
>
>      >
>      >
>      > Wouldn't a message such as "`pwd` directory does not exist on
>     node velascoj"
>      > be more illustrative?
>
>     Yes.  However, the set of improper uses of MPI that could generate
>     helpful error messages is uncountable.  Do you not think it is a good
>     use of finite developer effort to implement an infinitesimal fraction
>     of  such warnings?  There has to be a minimum requirement placed upon
>     the user.  I personally think that it should include running in a
>     directory that actually exists.
>
>
> Certainly! But then again, some developer must have thought it a good
> idea, since under different circumstances, I get:
>
> /bin/bash -c  mpiexec -n 1 -hosts tauro,velascoj gmandel
> [proxy:0:0 at tauro] launch_procs (./pm/pmiserv/pmip_cb.c:648): unable to
> change wdir to /tmp/edscott/mnt/tauro-home/GIT/gmandel (No such file or
> directory)
> [proxy:0:0 at tauro] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
> [proxy:0:0 at tauro] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at tauro] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
> [mpiexec at velascoj] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert
> (!closed) failed
> [mpiexec at velascoj] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at velascoj] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at velascoj] main (./ui/mpich/mpiexec.c:331): process manager
> error waiting for completion
> Which is inconsistent with the previous behavior. Anyways, its no big deal.
>
> BTW, would you happen to know why a process which is started with
> MPI_Comm_spawn will go into what seems like an active wait after
> MPI_Comm_disconnect and MPI_Finalize has been called? These spawned
> processed will hog up CPU until the parent process exits. Curious
> enough, this behavior is not mirrored in openmpi.
>
> Edscott
>
>
>
> -------------------------------
> Dr. Edscott Wilson Garcia
> Applied Mathematics and Computing
> Mexican Petroleum Institute
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list