[mpich-discuss] MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection timed out

amelie chi zhou amelie.czhou at gmail.com
Tue Mar 15 22:26:01 CDT 2016


Hi, Pavan,

Here is the full output info. Thanks!

ubuntu at ip-10-237-132-179:~/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code$
mpiexec -n 2 -f host_file -verbose ./send_recv
host: ec2-54-185-239-50.us-west-2.compute.amazonaws.com
host: ec2-54-196-213-218.compute-1.amazonaws.com

==================================================================================================
mpiexec options:
----------------
  Base path: /usr/local/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=155.69.144.109 63331 22
    SSH_TTY=/dev/pts/0
    USER=ubuntu

LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
    MAIL=/var/mail/ubuntu

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
    PWD=/home/ubuntu/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code
    LANG=en_US.UTF-8
    SHLVL=1
    HOME=/home/ubuntu
    LOGNAME=ubuntu
    SSH_CONNECTION=155.69.144.109 63331 10.237.132.179 22
    LESSOPEN=| /usr/bin/lesspipe %s
    LESSCLOSE=/usr/bin/lesspipe %s %s
    OLDPWD=/home/ubuntu/mpitest
    _=/usr/local/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: ec2-54-185-239-50.us-west-2.compute.amazonaws.com (1 cores)
      Exec list: ./send_recv (1 processes);

      [2] proxy: ec2-54-196-213-218.compute-1.amazonaws.com (1 cores)
      Exec list: ./send_recv (1 processes);


==================================================================================================

[mpiexec at ip-10-237-132-179] Timeout set to -1 (-1 means infinite)
[mpiexec at ip-10-237-132-179] Got a control port string of
ec2-54-185-239-50.us-west-2.compute.amazonaws.com:38817

Proxy launch args: /usr/local/bin/hydra_pmi_proxy --control-port
ec2-54-185-239-50.us-west-2.compute.amazonaws.com:38817 --debug --rmk user
--launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
--hostname ec2-54-185-239-50.us-west-2.compute.amazonaws.com
--global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2
--auto-cleanup 1 --pmi-kvsname kvs_1442_0 --pmi-process-mapping
(vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 18 'TERM=xterm'
'SHELL=/bin/bash' 'SSH_CLIENT=155.69.144.109 63331 22' 'SSH_TTY=/dev/pts/0'
'USER=ubuntu'
'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
'MAIL=/var/mail/ubuntu'
'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games'
'PWD=/home/ubuntu/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code'
'LANG=en_US.UTF-8' 'SHLVL=1' 'HOME=/home/ubuntu' 'LOGNAME=ubuntu'
'SSH_CONNECTION=155.69.144.109 63331 10.237.132.179 22' 'LESSOPEN=|
/usr/bin/lesspipe %s' 'LESSCLOSE=/usr/bin/lesspipe %s %s'
'OLDPWD=/home/ubuntu/mpitest' '_=/usr/local/bin/mpiexec' --global-user-env
0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir
/home/ubuntu/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code
--exec-args 1 ./send_recv

Arguments being passed to proxy 1:
--version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
--hostname ec2-54-196-213-218.compute-1.amazonaws.com --global-core-map
0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1
--pmi-kvsname kvs_1442_0 --pmi-process-mapping (vector,(0,2,1))
--ckpoint-num -1 --global-inherited-env 18 'TERM=xterm' 'SHELL=/bin/bash'
'SSH_CLIENT=155.69.144.109 63331 22' 'SSH_TTY=/dev/pts/0' 'USER=ubuntu'
'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
'MAIL=/var/mail/ubuntu'
'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games'
'PWD=/home/ubuntu/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code'
'LANG=en_US.UTF-8' 'SHLVL=1' 'HOME=/home/ubuntu' 'LOGNAME=ubuntu'
'SSH_CONNECTION=155.69.144.109 63331 10.237.132.179 22' 'LESSOPEN=|
/usr/bin/lesspipe %s' 'LESSCLOSE=/usr/bin/lesspipe %s %s'
'OLDPWD=/home/ubuntu/mpitest' '_=/usr/local/bin/mpiexec' --global-user-env
0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir
/home/ubuntu/mpitest/mpitutorial/tutorials/mpi-send-and-receive/code
--exec-args 1 ./send_recv

[mpiexec at ip-10-237-132-179] Launch arguments:
/usr/local/bin/hydra_pmi_proxy --control-port
ec2-54-185-239-50.us-west-2.compute.amazonaws.com:38817 --debug --rmk user
--launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at ip-10-237-132-179] Launch arguments: /usr/bin/ssh -x
ec2-54-196-213-218.compute-1.amazonaws.com "/usr/local/bin/hydra_pmi_proxy"
--control-port ec2-54-185-239-50.us-west-2.compute.amazonaws.com:38817
--debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
--usize -2 --proxy-id 1
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=response_to_init
pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get_maxes

[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get_appnum

[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=appnum appnum=0
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=my_kvsname
kvsname=kvs_1442_0
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get_my_kvsname

[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=my_kvsname
kvsname=kvs_1442_0
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get
kvsname=kvs_1442_0 key=PMI_process_mapping
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): barrier_in

[proxy:0:0 at ip-10-237-132-179] forwarding command (cmd=barrier_in) upstream
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=response_to_init
pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): get_maxes

[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=maxes kvsname_max=256
keylen_max=64 vallen_max=1024
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): get_appnum

[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=appnum appnum=0
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): get_my_kvsname

[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=my_kvsname kvsname=kvs_1442_0
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): get_my_kvsname

[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=my_kvsname kvsname=kvs_1442_0
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): get
kvsname=kvs_1442_0 key=PMI_process_mapping
[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=get_result rc=0 msg=success
value=(vector,(0,2,1))
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): barrier_in

[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at ip-10-237-132-179] PMI response to fd 6 pid 4: cmd=barrier_out
[mpiexec at ip-10-237-132-179] PMI response to fd 7 pid 4: cmd=barrier_out
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=barrier_out
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): put
kvsname=kvs_1442_0 key=P0-businesscard value=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com
$port#34711$ifname#10.237.132.179$
[proxy:0:0 at ip-10-237-132-179] cached command: P0-businesscard=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com
$port#34711$ifname#10.237.132.179$
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): barrier_in

[proxy:0:0 at ip-10-237-132-179] flushing 1 put command(s) out
[proxy:0:0 at ip-10-237-132-179] forwarding command (cmd=put
P0-businesscard=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com$port#34711$ifname#10.237.132.179$)
upstream
[proxy:0:0 at ip-10-237-132-179] forwarding command (cmd=barrier_in) upstream
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=put
P0-businesscard=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com
$port#34711$ifname#10.237.132.179$
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at ip-10-37-219-175] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=barrier_out
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): put
kvsname=kvs_1442_0 key=P1-businesscard value=description#
ec2-54-196-213-218.compute-1.amazonaws.com$port#50148$ifname#10.37.219.175$
[proxy:0:1 at ip-10-37-219-175] cached command: P1-businesscard=description#
ec2-54-196-213-218.compute-1.amazonaws.com$port#50148$ifname#10.37.219.175$
[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at ip-10-37-219-175] got pmi command (from 4): barrier_in

[proxy:0:1 at ip-10-37-219-175] flushing 1 put command(s) out
[proxy:0:1 at ip-10-37-219-175] forwarding command (cmd=put
P1-businesscard=description#ec2-54-196-213-218.compute-1.amazonaws.com$port#50148$ifname#10.37.219.175$)
upstream
[proxy:0:1 at ip-10-37-219-175] forwarding command (cmd=barrier_in) upstream
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=put
P1-businesscard=description#ec2-54-196-213-218.compute-1.amazonaws.com
$port#50148$ifname#10.37.219.175$
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at ip-10-237-132-179] PMI response to fd 6 pid 4: cmd=keyval_cache
P0-businesscard=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com$port#34711$ifname#10.237.132.179$
P1-businesscard=description#ec2-54-196-213-218.compute-1.amazonaws.com
$port#50148$ifname#10.37.219.175$
[mpiexec at ip-10-237-132-179] PMI response to fd 7 pid 4: cmd=keyval_cache
P0-businesscard=description#
ec2-54-185-239-50.us-west-2.compute.amazonaws.com$port#34711$ifname#10.237.132.179$
P1-businesscard=description#ec2-54-196-213-218.compute-1.amazonaws.com
$port#50148$ifname#10.37.219.175$
[mpiexec at ip-10-237-132-179] PMI response to fd 6 pid 4: cmd=barrier_out
[mpiexec at ip-10-237-132-179] PMI response to fd 7 pid 4: cmd=barrier_out
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=barrier_out
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): get
kvsname=kvs_1442_0 key=P1-businesscard
[proxy:0:0 at ip-10-237-132-179] PMI response: cmd=get_result rc=0 msg=success
value=description#ec2-54-196-213-218.compute-1.amazonaws.com
$port#50148$ifname#10.37.219.175$
[proxy:0:1 at ip-10-37-219-175] PMI response: cmd=barrier_out
[proxy:0:0 at ip-10-237-132-179] got pmi command (from 0): abort
exitcode=1174117
[proxy:0:0 at ip-10-237-132-179] we don't understand this command abort;
forwarding upstream
[mpiexec at ip-10-237-132-179] [pgid: 0] got PMI command: cmd=abort
exitcode=1174117
Fatal error in MPI_Send: Unknown error class, error stack:
MPI_Send(174)..............: MPI_Send(buf=0x7fffc219b73c, count=1, MPI_INT,
dest=1, tag=0, MPI_COMM_WORLD) failed
MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection
timed out


On Wed, Mar 16, 2016 at 11:08 AM, Balaji, Pavan <balaji at anl.gov> wrote:

> Amelie,
>
> Can you run your mpiexec command with the -verbose option and paste the
> output here?
>
> % mpiexec -n 2 -f host_file -verbose ./send_recv_test
>
>   -- Pavan
>
> > On Mar 15, 2016, at 10:01 PM, amelie chi zhou <amelie.czhou at gmail.com>
> wrote:
> >
> > Hi, Ken,
> >
> > I tried with netcat and the connection is successfully established.
> >
> > On one side of the machines, I ran:
> > ubuntu at ip-10-235-37-156:~$ netcat -l 10000
> >
> > On the other side:
> > ubuntu at ip-10-169-125-85:~/mpitest$ netcat -v
> ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000
> > Connection to ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000
> port [tcp/webmin] succeeded!
> >
> > On Wed, Mar 16, 2016 at 12:11 AM, Kenneth Raffenetti <
> raffenet at mcs.anl.gov> wrote:
> > I suspect that there is still a firewall in the way given that the EC2
> instances are in different regions. One way to test your security group
> rules without MPI would be to try to establish a connection between the 2
> machines on a high TCP port (e.g. 10000) with a simple utility like netcat (
> https://en.wikipedia.org/wiki/Netcat).
> >
> > Ken
> >
> >
> > On 03/15/2016 10:38 AM, amelie chi zhou wrote:
> > Hi, Ken,
> >
> > Thanks for the reply.
> > What kind of problem are you referring to?
> > In the rules of the security groups, I allow tcp connections from all ip
> addresses for all ports. Also, the two machines can ssh and scp to each
> other with no problem. In this simple test, security is not my major
> concern.
> >
> > Regards,
> > Amelie
> > On 15 Mar 2016, at 10:23 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov>
> wrote:
> >
> > The different regions are a problem in this setup. Note that security
> groups in EC2 are *per region*.
> >
> >
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group
> >
> > I'll note that using MPI across the internet like this is a bad idea if
> you have concerns about security.
> >
> > Ken
> >
> > On 03/15/2016 06:16 AM, amelie chi zhou wrote:
> > Hi,
> >
> > I configured two virtual machines on Amazon EC2 to run mpich-3.2. The
> > system is Ubuntu 12.04.2 LTS.
> >
> > The two virtual machines can ssh to each other successfully
> > (passwordless) and I can run a simple hello world program using the two
> > machines.
> >
> > ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./hello_world
> > Hello world from processor ip-10-169-125-85, rank 1 out of 2 processors
> > Hello world from processor ip-10-235-37-156, rank 0 out of 2 processors
> >
> > Then I run a simple program with MPI_Send and MPI_Receive to communicate
> > between the two vms. Following are the core code of the program.
> >
> >   if (world_rank == 0) {
> >      // If we are rank 0, set the number to -1 and send it to process 1
> >      number = -1;
> >      MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
> >    } else if (world_rank == 1) {
> >      MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
> MPI_STATUS_IGNORE);
> >      printf("Process 1 received number %d from process 0\n", number);
> >    }
> >
> >
> > Following are the error msg I encountered.
> >
> > ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./send_recv
> > Fatal error in MPI_Send: Unknown error class, error stack:
> > MPI_Send(174)..............: MPI_Send(buf=0x7fff49f2759c, count=1,
> > MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
> > MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection
> > timed out
> >
> >
> > I googled similar errors and have made sure that: 1) there is no rule in
> > my firewall setting, 2) there is a tcp port listening on both sides when
> > the send_recv program runs. I cannot think of any other possible way to
> > fix this problem. BTW, the two virtual machines are on two different
> > regions of Amazon EC2 and are not in VPCs. Please help. Thanks!
> >
> > Regards,
> > Amelie
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160316/bc4e320d/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list