[mpich-discuss] weird behavior with mpiexe (3.0.4)

Jeff Hammond jhammond at alcf.anl.gov
Wed May 29 16:02:00 CDT 2013


> This is weird. If I chdir to /home/edscott/tmp and execute the following
> command I get:
>
> edscott at tauro ~/tmp $  mpiexec -n 2 -hosts tauro,velascoj /bin/hostname
> tauro
> velascoj
>
> But if I chdir to /home/edscott/temp and do the same I get:
>
> edscott at tauro ~/temp $  mpiexec -n 2 -hosts tauro,velascoj /bin/hostname
> tauro
> [mpiexec at tauro] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed)
> failed
> [mpiexec at tauro] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at tauro] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
> [mpiexec at tauro] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
>
> Wouldn't a message such as "`pwd` directory does not exist on node velascoj"
> be more illustrative?

Yes.  However, the set of improper uses of MPI that could generate
helpful error messages is uncountable.  Do you not think it is a good
use of finite developer effort to implement an infinitesimal fraction
of  such warnings?  There has to be a minimum requirement placed upon
the user.  I personally think that it should include running in a
directory that actually exists.

Jeff

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides



More information about the discuss mailing list