[mpich-discuss] mpich hangs

Syed. Jahanzeb Maqbool Hashmi jahanzeb.maqbool at gmail.com
Thu Jun 27 22:36:18 CDT 2013


and here is that output:

Process 0 of 8 is on weiser1
Process 1 of 8 is on weiser1
Process 2 of 8 is on weiser1
Process 3 of 8 is on weiser1
Process 4 of 8 is on weiser2
Process 5 of 8 is on weiser2
Process 6 of 8 is on weiser2
Process 7 of 8 is on weiser2
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.018203

---------------


On Fri, Jun 28, 2013 at 12:35 PM, Syed. Jahanzeb Maqbool Hashmi <
jahanzeb.maqbool at gmail.com> wrote:

> Yes I am successfully able to run cpi program. No such error at all.
>
>
>
> On Fri, Jun 28, 2013 at 12:31 PM, Jeff Hammond <jeff.science at gmail.com>wrote:
>
>> Can you run the cpi program?  If that doesn't run, something is wrong,
>> because that program is trivial and correct.
>>
>> Jeff
>>
>> On Thu, Jun 27, 2013 at 10:29 PM, Syed. Jahanzeb Maqbool Hashmi
>> <jahanzeb.maqbool at gmail.com> wrote:
>> > again that same error:
>> > Fatal error in PMPI_Wait: A process has failed, error stack:
>> > PMPI_Wait(180)............: MPI_Wait(request=0xbebb9a1c,
>> status=0xbebb99f0)
>> > failed
>> > MPIR_Wait_impl(77)........:
>> > dequeue_and_set_error(888): Communication error with rank 4
>> >
>> > here is the verbose output:
>> >
>> > --------------START------------------
>> >
>> > host: weiser1
>> > host: weiser2
>> >
>> >
>> ==================================================================================================
>> > mpiexec options:
>> > ----------------
>> >   Base path: /mnt/nfs/install/mpich-install/bin/
>> >   Launcher: (null)
>> >   Debug level: 1
>> >   Enable X: -1
>> >
>> >   Global environment:
>> >   -------------------
>> >     TERM=xterm
>> >     SHELL=/bin/bash
>> >
>> >
>> XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422
>> >     SSH_CLIENT=192.168.0.3 57311 22
>> >     OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1
>> >     SSH_TTY=/dev/pts/0
>> >     USER=linaro
>> >
>> >
>> LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35
>>
>>  :*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
>> >     LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib
>> >     MAIL=/var/mail/linaro
>> >
>> >
>> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin
>> >     PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a
>> >     LANG=C.UTF-8
>> >     SHLVL=1
>> >     HOME=/home/linaro
>> >     LOGNAME=linaro
>> >     SSH_CONNECTION=192.168.0.3 57311 192.168.0.101 22
>> >     LESSOPEN=| /usr/bin/lesspipe %s
>> >     LESSCLOSE=/usr/bin/lesspipe %s %s
>> >     _=/mnt/nfs/install/mpich-install/bin/mpiexec
>> >
>> >   Hydra internal environment:
>> >   ---------------------------
>> >     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>> >
>> >
>> >     Proxy information:
>> >     *********************
>> >       [1] proxy: weiser1 (4 cores)
>> >       Exec list: ./xhpl (4 processes);
>> >
>> >       [2] proxy: weiser2 (4 cores)
>> >       Exec list: ./xhpl (4 processes);
>> >
>> >
>> >
>> ==================================================================================================
>> >
>> > [mpiexec at weiser1] Timeout set to -1 (-1 means infinite)
>> > [mpiexec at weiser1] Got a control port string of weiser1:45851
>> >
>> > Proxy launch args: /mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy
>> > --control-port weiser1:45851 --debug --rmk user --launcher ssh --demux
>> poll
>> > --pgid 0 --retries 10 --usize -2 --proxy-id
>> >
>> > Arguments being passed to proxy 0:
>> > --version 3.0.4 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
>> > weiser1 --global-core-map 0,4,8 --pmi-id-map 0,0 --global-process-count
>> 8
>> > --auto-cleanup 1 --pmi-kvsname kvs_24541_0 --pmi-process-mapping
>> > (vector,(0,2,4)) --ckpoint-num -1 --global-inherited-env 20 'TERM=xterm'
>> > 'SHELL=/bin/bash'
>> >
>> 'XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422'
>> > 'SSH_CLIENT=192.168.0.3 57311 22'
>> > 'OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1' 'SSH_TTY=/dev/pts/0'
>> > 'USER=linaro'
>> >
>> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
>>
>>  5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
>> > 'LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib'
>> > 'MAIL=/var/mail/linaro'
>> >
>> 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin'
>> > 'PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a' 'LANG=C.UTF-8'
>> > 'SHLVL=1' 'HOME=/home/linaro' 'LOGNAME=linaro'
>> 'SSH_CONNECTION=192.168.0.3
>> > 57311 192.168.0.101 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
>> > 'LESSCLOSE=/usr/bin/lesspipe %s %s'
>> > '_=/mnt/nfs/install/mpich-install/bin/mpiexec' --global-user-env 0
>> > --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
>> > --proxy-core-count 4 --exec --exec-appnum 0 --exec-proc-count 4
>> > --exec-local-env 0 --exec-wdir
>> > /mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a --exec-args 1 ./xhpl
>> >
>> > Arguments being passed to proxy 1:
>> > --version 3.0.4 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
>> > weiser2 --global-core-map 0,4,8 --pmi-id-map 0,4 --global-process-count
>> 8
>> > --auto-cleanup 1 --pmi-kvsname kvs_24541_0 --pmi-process-mapping
>> > (vector,(0,2,4)) --ckpoint-num -1 --global-inherited-env 20 'TERM=xterm'
>> > 'SHELL=/bin/bash'
>> >
>> 'XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422'
>> > 'SSH_CLIENT=192.168.0.3 57311 22'
>> > 'OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1' 'SSH_TTY=/dev/pts/0'
>> > 'USER=linaro'
>> >
>> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
>>
>>  5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
>> > 'LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib'
>> > 'MAIL=/var/mail/linaro'
>> >
>> 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin'
>> > 'PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a' 'LANG=C.UTF-8'
>> > 'SHLVL=1' 'HOME=/home/linaro' 'LOGNAME=linaro'
>> 'SSH_CONNECTION=192.168.0.3
>> > 57311 192.168.0.101 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
>> > 'LESSCLOSE=/usr/bin/lesspipe %s %s'
>> > '_=/mnt/nfs/install/mpich-install/bin/mpiexec' --global-user-env 0
>> > --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
>> > --proxy-core-count 4 --exec --exec-appnum 0 --exec-proc-count 4
>> > --exec-local-env 0 --exec-wdir
>> > /mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a --exec-args 1 ./xhpl
>> >
>> > [mpiexec at weiser1] Launch arguments:
>> > /mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy --control-port
>> > weiser1:45851 --debug --rmk user --launcher ssh --demux poll --pgid 0
>> > --retries 10 --usize -2 --proxy-id 0
>> > [mpiexec at weiser1] Launch arguments: /usr/bin/ssh -x weiser2
>> > "/mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy" --control-port
>> > weiser1:45851 --debug --rmk user --launcher ssh --demux poll --pgid 0
>> > --retries 10 --usize -2 --proxy-id 1
>> > [proxy:0:0 at weiser1] got pmi command (from 0): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 15): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 8): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): put
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] cached command:
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 0): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 6): barrier_in
>> >
>> > [proxy:0:0 at weiser1] flushing 1 put command(s) out
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] forwarding command (cmd=put
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9) upstream
>> > [proxy:0:0 at weiser1] forwarding command (cmd=barrier_in) upstream
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [proxy:0:1 at weiser2] got pmi command (from 7): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 4): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): put
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] cached command:
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): barrier_in
>> >
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=keyval_cache
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=keyval_cache
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=barrier_out
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 10): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 10): barrier_in
>> >
>> > [proxy:0:1 at weiser2] flushing 1 put command(s) out
>> > [proxy:0:1 at weiser2] forwarding command (cmd=put
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa) upstream
>> > [proxy:0:1 at weiser2] forwarding command (cmd=barrier_in) upstream
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] got pmi command (from 0): put
>> > kvsname=kvs_24541_0 key=P0-businesscard
>> > value=description#weiser1$port#56190$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 8): put
>> > kvsname=kvs_24541_0 key=P2-businesscard
>> > value=description#weiser1$port#40019$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 15): put
>> > kvsname=kvs_24541_0 key=P3-businesscard
>> > value=description#weiser1$port#57150$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 0): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): put
>> > kvsname=kvs_24541_0 key=P1-businesscard
>> > value=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 8): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 15): barrier_in
>> >
>> > [proxy:0:0 at weiser1] flushing 4 put command(s) out
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] forwarding command (cmd=put
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$)
>> > upstream
>> > [proxy:0:0 at weiser1] forwarding command (cmd=barrier_in) upstream
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [proxy:0:1 at weiser2] got pmi command (from 4): put
>> > kvsname=kvs_24541_0 key=P4-businesscard
>> > value=description#weiser2$port#60693$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 5): put
>> > kvsname=kvs_24541_0 key=P5-businesscard
>> > value=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 7): put
>> > kvsname=kvs_24541_0 key=P6-businesscard
>> > value=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 10): put
>> > kvsname=kvs_24541_0 key=P7-businesscard
>> > value=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] [mpiexec at weiser1] [pgid: 0] got PMI command:
>> cmd=put
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 4): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 5): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 7): barrier_in
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=keyval_cache
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=keyval_cache
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=barrier_out
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1]
>> > [proxy:0:1 at weiser2] got pmi command (from 10): barrier_in
>> >
>> > [proxy:0:1 at weiser2] flushing 4 put command(s) out
>> > [proxy:0:1 at weiser2] forwarding command (cmd=put
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$)
>> > upstream
>> > [proxy:0:1 at weiser2] forwarding command (cmd=barrier_in) upstream
>> > PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get
>> > kvsname=kvs_24541_0 key=P0-businesscard
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser1$port#56190$ifname#192.168.0.101$
>> >
>> ================================================================================
>> > HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26,
>> 2012
>> > Written by A. Petitet and R. Clint Whaley,  Innovative Computing
>> Laboratory,
>> > UTK
>> > Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
>> > Modified by Julien Langou, University of Colorado Denver
>> >
>> ================================================================================
>> >
>> > An explanation of the input/output parameters follows:
>> > T/V    : Wall time / encoded variant.
>> > N      : The order of the coefficient matrix A.
>> > NB     : The partitioning blocking factor.
>> > P      : The number of process rows.
>> > Q      : The number of process columns.
>> > Time   : Time in seconds to solve the linear system.
>> > Gflops : Rate of execution for solving the linear system.
>> >
>> > The following parameter values will be used:
>> >
>> > N      :   14616
>> > NB     :     168
>> > PMAP   : Row-major process mapping
>> > P      :       2
>> > Q      :       4
>> > PFACT  :   Right
>> > NBMIN  :       4
>> > NDIV   :       2
>> > RFACT  :   Crout
>> > BCAST  :  1ringM
>> > DEPTH  :       1
>> > SWAP   : Mix (threshold = 64)
>> > L1     : transposed form
>> > U      : transposed form
>> > EQUIL  : yes
>> > ALIGN  : 8 double precision words
>> >
>> >
>> --------------------------------------------------------------------------------
>> >
>> > - The matrix A is randomly generated for each test.
>> > - The following scaled residual check will be computed:
>> >       ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) *
>> N )
>> > - The relative machine precision (eps) is taken to be
>> > 1.110223e-16
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > - Computational tests pass if scaled residuals are less than
>> > 16.0
>> >
>> > kvsname=kvs_24541_0 key=P5-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=P7-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=P6-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=P1-businesscard
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser1$port#34048$ifname#192.168.0.101$
>> >
>> >
>> ===================================================================================
>> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > =   EXIT CODE: 9
>> > =   CLEANING UP REMAINING PROCESSES
>> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >
>> ===================================================================================
>> >
>> >
>> > ----------- END --------------
>> >
>> > if that can help :(
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 28, 2013 at 12:24 PM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> >>
>> >>
>> >> Looks like your application aborted for some reason.
>> >>
>> >>  -- Pavan
>> >>
>> >>
>> >> On 06/27/2013 10:21 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>> >>>
>> >>> My bad, I just found out that there was a duplicate entry like:
>> >>> weiser1 127.0.1.1
>> >>> weiser1 192.168.0.101
>> >>> so i removed teh 127.x.x.x. entry and kept the hostfile contents
>> similar
>> >>> on both nodes. Now previous error is reduced to this one:
>> >>>
>> >>> ------ START OF OUTPUT -------
>> >>>
>> >>> ....some HPL startup string (no final result)
>> >>> ...skip.....
>> >>>
>> >>>
>> >>>
>> ===================================================================================
>> >>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >>> =   EXIT CODE: 9
>> >>> =   CLEANING UP REMAINING PROCESSES
>> >>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >>>
>> >>>
>> ===================================================================================
>> >>> [proxy:0:0 at weiser1] HYD_pmcd_pmip_control_cmd_cb
>> >>> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
>> >>> [proxy:0:0 at weiser1] HYDT_dmxu_poll_wait_for_event
>> >>> (./tools/demux/demux_poll.c:77): callback returned error status
>> >>> [proxy:0:0 at weiser1] main (./pm/pmiserv/pmip.c:206): demux engine
>> error
>> >>> waiting for event
>> >>> [mpiexec at weiser1] HYDT_bscu_wait_for_completion
>> >>> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>> >>> terminated badly; aborting
>> >>> [mpiexec at weiser1] HYDT_bsci_wait_for_completion
>> >>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>> waiting
>> >>> for completion
>> >>> [mpiexec at weiser1] HYD_pmci_wait_for_completion
>> >>> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
>> >>> completion
>> >>> [mpiexec at weiser1] main (./ui/mpich/mpiexec.c:331): process manager
>> error
>> >>> waiting for completion
>> >>>
>> >>> ------ END OF OUTPUT -------
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jun 28, 2013 at 12:12 PM, Pavan Balaji <balaji at mcs.anl.gov
>> >>> <mailto:balaji at mcs.anl.gov>> wrote:
>> >>>
>> >>>
>> >>>     On 06/27/2013 10:08 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>> >>>
>> >>>
>> >>>
>> P4-businesscard=description#__weiser2$port#57651$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P5-businesscard=description#__weiser2$port#52622$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P6-businesscard=description#__weiser2$port#55935$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P7-businesscard=description#__weiser2$port#54952$ifname#192.__168.0.102$
>> >>>
>> >>> P0-businesscard=description#__weiser1$port#41958$ifname#127.__0.1.1$
>> >>>
>> >>> P2-businesscard=description#__weiser1$port#35049$ifname#127.__0.1.1$
>> >>>
>> >>> P1-businesscard=description#__weiser1$port#39634$ifname#127.__0.1.1$
>> >>>
>> >>> P3-businesscard=description#__weiser1$port#51802$ifname#127.__0.1.1$
>> >>>
>> >>>
>> >>>
>> >>>     I have two concerns with your output.  Let's start with the first.
>> >>>
>> >>>     Did you look at this question on the FAQ page?
>> >>>
>> >>>     "Is your /etc/hosts file consistent across all nodes? Unless you
>> are
>> >>>     using an external DNS server, the /etc/hosts file on every machine
>> >>>     should contain the correct IP information about all hosts in the
>> >>>     system."
>> >>>
>> >>>
>> >>>       -- Pavan
>> >>>
>> >>>     --
>> >>>     Pavan Balaji
>> >>>     http://www.mcs.anl.gov/~balaji
>> >>>
>> >>>
>> >>
>> >> --
>> >> Pavan Balaji
>> >> http://www.mcs.anl.gov/~balaji
>> >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130628/fbf32c27/attachment.html>


More information about the discuss mailing list