[mpich-discuss] Problem Running MPI on cluster

Kenneth Raffenetti raffenet at mcs.anl.gov
Wed Oct 22 08:51:31 CDT 2014


Hi,

Yes, in order to run your program, you'll first need to copy your MPICH 
installation to all the included machines in your hostfile. If you have 
an NFS environment, it may be easier to install MPICH to a network 
directory available on all machine.

Same goes for your executable. It must be present at the same filesystem 
location on all machines in order for mpiexec to find and run it.

Ken

On 10/22/2014 02:04 AM, Md. Amjad Hossain wrote:
>     Thank you so much Ken for your reply.
>
>     Do I have to copy executable file to all machines? what I am doing
>     is, coding, compiling and running on a single machine and host_file
>     contains name of the other machines.
>
>     I have run the command you gave me to run. it is printing the only
>     name of the machine I am executing command and showing all previous
>     errors. Here are the outputs after running the command twice:
>
>
>     [mhossain at md-lin-01 mpi_hello_world]$ /usr/lib64/mpich/bin/mpirun -n
>     4 -f host_file hostname
>     md-lin-01.mcs.kent.edu <http://md-lin-01.mcs.kent.edu>
>     [proxy:0:1 at md-lin-02.mcs.kent.edu
>     <mailto:proxy%3A0%3A1 at md-lin-02.mcs.kent.edu>] launch_procs
>     (./pm/pmiserv/pmip_cb.c:648): unable to change wdir to
>     /home/mhossain/testMpi/mpi_hello_world (No such file or directory)
>     [proxy:0:1 at md-lin-02.mcs.kent.edu
>     <mailto:proxy%3A0%3A1 at md-lin-02.mcs.kent.edu>]
>     HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:893):
>     launch_procs returned error
>     [proxy:0:1 at md-lin-02.mcs.kent.edu
>     <mailto:proxy%3A0%3A1 at md-lin-02.mcs.kent.edu>]
>     HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77):
>     callback returned error status
>     [proxy:0:1 at md-lin-02.mcs.kent.edu
>     <mailto:proxy%3A0%3A1 at md-lin-02.mcs.kent.edu>] main
>     (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>] control_cb
>     (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>]
>     HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77):
>     callback returned error status
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>]
>     HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197):
>     error waiting for event
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>] main
>     (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
>
>     [mhossain at md-lin-01 mpi_hello_world]$ /usr/lib64/mpich/bin/mpirun -n
>     4 -f host_file hostname
>     md-lin-01.mcs.kent.edu <http://md-lin-01.mcs.kent.edu>
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>] control_cb
>     (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>]
>     HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77):
>     callback returned error status
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>]
>     HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197):
>     error waiting for event
>     [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>] main
>     (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
>
>
>     Is this problem with configuration or MPICH version?
>
>     Regards
>     Amjad
>
>
>
>
>
>     Does your mpi_hello_world binary exist in the same directory on all the
>     machines you are trying to run on? Can you try running this:
>
>     /usr/lib64/mpich/bin/mpirun -n 4 -f host_file hostname
>
>     If it outputs the names of the hosts in your hostfile, we can be
>     confident that your mpirun and ssh setup is functioning correctly.
>
>     Ken
>
>
>
>
>
>     On 10/21/2014 12:09 AM, Md. Amjad Hossain wrote:
>      > Hi I am trying to run simple hello world program on cluster
>     nodes. I am
>      > running it by following command but getting errors:
>      >
>      > Command:  /usr/lib64/mpich/bin/mpirun -n 4 -f host_file
>     ./mpi_hello_world
>      >
>      > errors:
>      > [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>>]
>      > control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
>      > [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>>]
>      > HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77):
>     callback
>      > returned error status
>      > [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>>]
>      > HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error
>      > waiting for event
>      > [mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu
>     <mailto:mpiexec at md-lin-01.mcs.kent.edu>>]
>      > main (./ui/mpich/mpiexec.c:331): process manager error waiting for
>      > completion
>      >
>      >
>      > Before running the command I am setting variables MPIRUN =mpi
>     diretory
>      > and MPI_HOSTS=host_file. The "host_file" has four nodes and they
>     can ssh
>      > to each other without password.
>      >
>      > MPICH version I am running is: 3.0.4. The MPI code is attached.
>      >
>      > Any help to solve the problem please?
>      >
>      >
>      >
>      >
>      >
>      >
>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list