[mpich-discuss] discuss Digest, Vol 48, Issue 6

Mahdi, Sam sam.mahdi.846 at my.csun.edu
Tue Nov 1 16:17:27 CDT 2016


Hello everyone,

The host file content is just the IP of the other computer. So its just:
130.166.115.232

When I typed in the mpirun --host my comp,therecomp /bin/hostname this was
the output
crowlab: [~]> mpirun --host 130.166.115.232,130.166.115.232 /bin/hostname
[proxy:0:0 at localhost.localdomain] HYDU_sock_connect
(./utils/sock/sock.c:174): unable to connect from "localhost.localdomain"
to "localhost.localdomain" (Connection refused)
[proxy:0:0 at localhost.localdomain] main (./pm/pmiserv/pmip.c:189): unable to
connect to server localhost.localdomain at port 41231 (check for firewalls!)

Again same problem as before, its attempting to connect from
"localhost.localdomain" (my computer) to "localhost.localdomain" (again my
computer) instead of using the IP addresses that I gave it.

Sincerely,
Sam



On Mon, Oct 31, 2016 at 9:47 PM, <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1. Re:  Running a program on multiple computers (Kenneth Raffenetti)
>    2. Re:  Using external load-balancer with mpich (Halim Amer)
>    3. Re:  Using external load-balancer with mpich
>       (lostfreeman at gmail.com)
>    4. Re:  Is there a way to set timeout to mpi process launch?
>       (Halim Amer)
>    5. Re:  Using external load-balancer with mpich (Halim Amer)
>    6. Re:  Is there a way to set timeout to mpi process launch?
>       (Pranav Ladkat)
>    7.  ADIOI_Set_lock error (Luke Van Roekel)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 27 Oct 2016 22:17:59 -0500
> From: Kenneth Raffenetti <raffenet at mcs.anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Running a program on multiple computers
> Message-ID: <3a59ae1a-8577-4b86-3e99-b27d97544ac6 at mcs.anl.gov>
> Content-Type: text/plain; charset="utf-8"; format=flowed
>
> Hi,
>
> Apologies for the delay in response. My comments inline below.
>
> On 10/15/2016 07:40 PM, Mahdi, Sam wrote:
> > Hello everyone,
> >
> > I am attempting to run a single program on 32 cores split across 4
> > computers (So each computer has 8 cores). I am attempting to use mpich
> > for this. I currently am just testing on 2 computers, I have the program
> > installed on both, as well as mpich installed on both. I have created a
> > register key and can login in using ssh into the other computer without
> > a password. I have come across 2 problems. One, when I attempt to
> > connect using the mpirun -np 3 --host a (the IP of the computer I am
> > attempting to connect to) hostname
> > I recieve the error
> >  unable to connect from "localhost.localdomain" to
> "localhost.localdomain"
> >
> > This is indicating my computers "localhost.localdomain" is attempting to
> > connect to another "localhost.localdomain". How can I change this so
> > that it connects via my IP to the other computers IP?
> >
> > Secondly, I attempted to use a host file instead using the hydra process
> > wiki. I created a hosts file with just the IP of the computer I am
> > attempting to connect to. When I type in the command mpiexec -f hosts -n
> > 4 ./applic
> >
> > I get this error
> > [mpiexec at localhost.localdomain] HYDU_parse_hostfile
> > (./utils/args/args.c:323): unable to open host file: hosts
> >
> > along with other errors of unable to parse hostfile, match handler etc.
> > I assume this is all due to it being unable to read the host file. Is
> > there any specific place I should save my hosts file? I have it saved
> > directly on my Desktop. I have attempted to indicate the full path where
> > it is located, but I still get the same error.
>
> There is no required location for the hosts file. If you are specifying
> full path and there are still issues, it may be a formatting issue. Can
> you paste or attach the contents of your hosts file so we can confirm
> the format is good?
>
> >
> > For the first problem, I have read that I need to change /etc/hosts
> > manually by using the sudo command to manually enter the IP of the
> > computer I am attempting to connect to in the /etc/hosts file. I assume
> > the computer is attempting to connect to itself (set up the program
> > first on its own core, then send it to another, hence attempting to
> > start it on localhost.localdomain).
> >
> > For the second problem, I have attempted to add run the command
> >  mpirun --host my computer IP, the other computer IP ./program
>
> This format should be okay for your purposes. What happens if you try:
>
>    mpirun --host my computer IP, the other computer IP /bin/hostname
>
> If the hostnames of each host are echoed to the command-line, then job
> launch is successful and the issues is during connection setup during
> MPI_Init.
>
> Ken
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 27 Oct 2016 23:18:08 -0500
> From: Halim Amer <aamer at anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Using external load-balancer with mpich
> Message-ID: <84e884b3-6a11-fdf7-c143-93adc64a4feb at anl.gov>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
> I don't understand what you are trying to do. Can you give an example?
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 10/26/16 5:21 PM, lost wrote:
> > Can I use an external load balancer with mpiexec by providing a single
> > hostname in hosts file with, optionally, some large number for host
> > rank, and putting load balancer listening on that hostname and
> > forwarding connections to the actual hosts?
> >
> > I am trying to achieve autoscaling (load balancer tracks liveness of
> > hosts and spins up new ones on demand).
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 27 Oct 2016 21:59:03 -0700
> From: <lostfreeman at gmail.com>
> To: Halim Amer <aamer at anl.gov>, "discuss at mpich.org"
>         <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Using external load-balancer with mpich
> Message-ID: <5812db17.5789620a.c825c.7724 at mx.google.com>
> Content-Type: text/plain; charset="utf-8"
>
> I have a system, that can start and stop machines depending on fleet?s
> current load. Essentially, it must be the one to decide, which machine will
> receive a new task, and optionally start acquire a new one from some pool
> for it.
>
> For example, I can have two hosts of six currently executing something
> (might be not related to MPICH), and my load balancer machine is aware of
> it. Other 4 are in low-power state to conserve costs. So when I call
> mpiexec, I want to tell it to send all the tasks to the load balancer,
> requesting 2 hosts, so that load balancer then could start two new hosts to
> handle that request. If the initial hosts would not be busy with some other
> task, load balancer would send incoming jobs to them, and kept all other
> four deallocated.
>
> - Victor
>
> From: Halim Amer
> Sent: Thursday, October 27, 2016 9:18 PM
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Using external load-balancer with mpich
>
> I don't understand what you are trying to do. Can you give an example?
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 10/26/16 5:21 PM, lost wrote:
> > Can I use an external load balancer with mpiexec by providing a single
> > hostname in hosts file with, optionally, some large number for host
> > rank, and putting load balancer listening on that hostname and
> > forwarding connections to the actual hosts?
> >
> > I am trying to achieve autoscaling (load balancer tracks liveness of
> > hosts and spins up new ones on demand).
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.mpich.org/pipermail/discuss/attachments/
> 20161027/6ec78712/attachment-0001.html>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 28 Oct 2016 00:15:00 -0500
> From: Halim Amer <aamer at anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Is there a way to set timeout to mpi
>         process launch?
> Message-ID: <d5f14193-4d19-6acb-1c81-7768eb4c046d at anl.gov>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
> You can try setting MPIEXEC_TIMEOUT=<timeout value in seconds> to force
> the job to abort after running for the specified period. This is for the
> whole execution though, not just for the process launching step.
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 10/26/16 6:30 PM, Pranav Ladkat wrote:
> > Hi,
> >
> > When I run mpi program on multiple hosts, if executable fails to start
> > on any of the host (due to missing library etc. type of reasons), other
> > hosts just keep waiting for the process to come up. The program just
> > hangs forever. Is there any way to set a timeout in such cases such that
> > MPI should abort if not all processes were launched in a given timeout
> > period?
> >
> > Thanks,
> > Pranav
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 28 Oct 2016 09:58:54 -0500
> From: Halim Amer <aamer at anl.gov>
> To: <lostfreeman at gmail.com>, "discuss at mpich.org" <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Using external load-balancer with mpich
> Message-ID: <7e6bf186-fbe7-6651-c5fb-769a44db76f1 at anl.gov>
> Content-Type: text/plain; charset="utf-8"; format=flowed
>
> I don't know what is this load balancer you are using, but Hydra (the
> process manager) can interact with several standard job schedulers, such
> as SLURM and PBS. You can refer to
> https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
> for more information about how to use Hydra.
>
> Alternatively, if you want to keep your own scheduler, the only solution
> that comes to mind is for the scheduler to communicate indirectly with
> Hydra through a hosts file. The scheduler publishes the list of hosts in
> a file, say hosts.txt, and this file gets passed to hydra with the -f
> option: mpiexec -f hosts.txt. It is up to you to synchronize the
> scheduler and Hydra properly.
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 10/27/16 11:59 PM, lostfreeman at gmail.com wrote:
> > I have a system, that can start and stop machines depending on fleet?s
> > current load. Essentially, it must be the one to decide, which machine
> > will receive a new task, and optionally start acquire a new one from
> > some pool for it.
> >
> >
> >
> > For example, I can have two hosts of six currently executing something
> > (might be not related to MPICH), and my load balancer machine is aware
> > of it. Other 4 are in low-power state to conserve costs. So when I call
> > mpiexec, I want to tell it to send all the tasks to the load balancer,
> > requesting 2 hosts, so that load balancer then could start two new hosts
> > to handle that request. If the initial hosts would not be busy with some
> > other task, load balancer would send incoming jobs to them, and kept all
> > other four deallocated.
> >
> >
> >
> > - Victor
> >
> >
> >
> > *From: *Halim Amer <mailto:aamer at anl.gov>
> > *Sent: *Thursday, October 27, 2016 9:18 PM
> > *To: *discuss at mpich.org <mailto:discuss at mpich.org>
> > *Subject: *Re: [mpich-discuss] Using external load-balancer with mpich
> >
> >
> >
> > I don't understand what you are trying to do. Can you give an example?
> >
> >
> >
> > Halim
> >
> > www.mcs.anl.gov/~aamer
> >
> >
> >
> > On 10/26/16 5:21 PM, lost wrote:
> >
> >> Can I use an external load balancer with mpiexec by providing a single
> >
> >> hostname in hosts file with, optionally, some large number for host
> >
> >> rank, and putting load balancer listening on that hostname and
> >
> >> forwarding connections to the actual hosts?
> >
> >>
> >
> >> I am trying to achieve autoscaling (load balancer tracks liveness of
> >
> >> hosts and spins up new ones on demand).
> >
> >>
> >
> >>
> >
> >> _______________________________________________
> >
> >> discuss mailing list     discuss at mpich.org
> >
> >> To manage subscription options or unsubscribe:
> >
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> >>
> >
> > _______________________________________________
> >
> > discuss mailing list     discuss at mpich.org
> >
> > To manage subscription options or unsubscribe:
> >
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> >
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 31 Oct 2016 16:18:38 -0700
> From: Pranav Ladkat <pranavpr at buffalo.edu>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Is there a way to set timeout to mpi
>         process launch?
> Message-ID:
>         <CACV4Dhp2f7OC07r+k8SWucz39iKAxm5BqemGwfR_2oFT+
> oBs+g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Specifying MPIEXEC_TIMEOUT would not be possible since the execution times
> would vary depending on the different job. Is there any solution available
> within mpich?
>
> Thanks,
> Pranav
>
>
> On Thu, Oct 27, 2016 at 10:15 PM, Halim Amer <aamer at anl.gov> wrote:
>
> > You can try setting MPIEXEC_TIMEOUT=<timeout value in seconds> to force
> > the job to abort after running for the specified period. This is for the
> > whole execution though, not just for the process launching step.
> >
> > Halim
> > www.mcs.anl.gov/~aamer
> >
> >
> > On 10/26/16 6:30 PM, Pranav Ladkat wrote:
> >
> >> Hi,
> >>
> >> When I run mpi program on multiple hosts, if executable fails to start
> >> on any of the host (due to missing library etc. type of reasons), other
> >> hosts just keep waiting for the process to come up. The program just
> >> hangs forever. Is there any way to set a timeout in such cases such that
> >> MPI should abort if not all processes were launched in a given timeout
> >> period?
> >>
> >> Thanks,
> >> Pranav
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >> _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.mpich.org/pipermail/discuss/attachments/
> 20161031/f9297a36/attachment-0001.html>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 31 Oct 2016 22:47:16 -0600
> From: Luke Van Roekel <luke.vanroekel at gmail.com>
> To: discuss at mpich.org
> Subject: [mpich-discuss] ADIOI_Set_lock error
> Message-ID:
>         <CALP1AOFzKe_XpGs7fhx5WEZm_N4W+ouFCzt0dYqfTL7rGSBjuA@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I've been trying to compile and run a very simple mpi test on our cluster
> with intel-mpi and openmpi.  The test program is below.  When I run with
> openmpi everything is fine.  When I run with intel-mpi, I receive the
> following error
>
> This requires fcntl(2) to be implemented. As of 8/25/2011 it is not.
> Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd 6,cmd
> F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno
> 26.
>
> - If the file system is NFS, you need to use NFS version 3, ensure that the
> lockd daemon is running on all the machines, and mount the directory with
> the 'noac' option (no attribute caching).
>
> - If the file system is LUSTRE, ensure that the directory is mounted with
> the 'flock' option.
>
> ADIOI_Set_lock:: Function not implemented
>
> ADIOI_Set_lock:offset 0, length 4
>
> Any thoughts on how to proceed?  The size/format of the file read in seems
> to make no difference.
>
> Regards,
> Luke
>
>
>  #include <stdio.h>
>
>  #include <stdlib.h>
>
>  #include <mpi.h>
>
>
>
>   int main(int argc, char **argv) {
>
>     int buf, err;
>
>     MPI_File fh;
>
>     MPI_Status status;
>
>
>     MPI_Init(&argc, &argv);
>
>     if (argc != 2) {
>
>         printf("Usage: %s filename\n", argv[0]);
>
>         MPI_Finalize();
>
>         return 1;
>
>     }
>
>     err = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_CREATE |
>
>                         MPI_MODE_RDWR, MPI_INFO_NULL, &fh);
>
>     if (err != MPI_SUCCESS) printf("Error: MPI_File_open()\n");
>
>
>     err = MPI_File_write_all(fh, &buf, 1, MPI_INT, &status);
>
>     if (err != MPI_SUCCESS) printf("Error: MPI_File_write_all()\n");
>
>
>     MPI_File_close(&fh);
>
>     MPI_Finalize();
>
>     return 0;
>
> }
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.mpich.org/pipermail/discuss/attachments/
> 20161031/3c843404/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 48, Issue 6
> **************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20161101/db5ba784/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list