[mpich-discuss] mpich hangs

Jeff Hammond jeff.science at gmail.com
Thu Jun 27 22:48:17 CDT 2013


If CPI runs and your code doesn't, it's an app issue. You said this was
HPL? Ask UTK for support with this. It's their code. HPL is dirt simple so
I guess you are running it incorrectly.

Jeff

Sent from my iPhone

On Jun 27, 2013, at 10:36 PM, "Syed. Jahanzeb Maqbool Hashmi" <
jahanzeb.maqbool at gmail.com> wrote:

and here is that output:

Process 0 of 8 is on weiser1
Process 1 of 8 is on weiser1
Process 2 of 8 is on weiser1
Process 3 of 8 is on weiser1
Process 4 of 8 is on weiser2
Process 5 of 8 is on weiser2
Process 6 of 8 is on weiser2
Process 7 of 8 is on weiser2
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.018203

---------------


On Fri, Jun 28, 2013 at 12:35 PM, Syed. Jahanzeb Maqbool Hashmi <
jahanzeb.maqbool at gmail.com> wrote:

> Yes I am successfully able to run cpi program. No such error at all.
>
>
>
> On Fri, Jun 28, 2013 at 12:31 PM, Jeff Hammond <jeff.science at gmail.com>wrote:
>
>> Can you run the cpi program?  If that doesn't run, something is wrong,
>> because that program is trivial and correct.
>>
>> Jeff
>>
>> On Thu, Jun 27, 2013 at 10:29 PM, Syed. Jahanzeb Maqbool Hashmi
>> <jahanzeb.maqbool at gmail.com> wrote:
>> > again that same error:
>> > Fatal error in PMPI_Wait: A process has failed, error stack:
>> > PMPI_Wait(180)............: MPI_Wait(request=0xbebb9a1c,
>> status=0xbebb99f0)
>> > failed
>> > MPIR_Wait_impl(77)........:
>> > dequeue_and_set_error(888): Communication error with rank 4
>> >
>> > here is the verbose output:
>> >
>> > --------------START------------------
>> >
>> > host: weiser1
>> > host: weiser2
>> >
>> >
>> ==================================================================================================
>> > mpiexec options:
>> > ----------------
>> >   Base path: /mnt/nfs/install/mpich-install/bin/
>> >   Launcher: (null)
>> >   Debug level: 1
>> >   Enable X: -1
>> >
>> >   Global environment:
>> >   -------------------
>> >     TERM=xterm
>> >     SHELL=/bin/bash
>> >
>> >
>> XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422
>> >     SSH_CLIENT=192.168.0.3 57311 22
>> >     OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1
>> >     SSH_TTY=/dev/pts/0
>> >     USER=linaro
>> >
>> >
>> LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35
>>
>>  :*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
>> >     LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib
>> >     MAIL=/var/mail/linaro
>> >
>> >
>> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin
>> >     PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a
>> >     LANG=C.UTF-8
>> >     SHLVL=1
>> >     HOME=/home/linaro
>> >     LOGNAME=linaro
>> >     SSH_CONNECTION=192.168.0.3 57311 192.168.0.101 22
>> >     LESSOPEN=| /usr/bin/lesspipe %s
>> >     LESSCLOSE=/usr/bin/lesspipe %s %s
>> >     _=/mnt/nfs/install/mpich-install/bin/mpiexec
>> >
>> >   Hydra internal environment:
>> >   ---------------------------
>> >     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>> >
>> >
>> >     Proxy information:
>> >     *********************
>> >       [1] proxy: weiser1 (4 cores)
>> >       Exec list: ./xhpl (4 processes);
>> >
>> >       [2] proxy: weiser2 (4 cores)
>> >       Exec list: ./xhpl (4 processes);
>> >
>> >
>> >
>> ==================================================================================================
>> >
>> > [mpiexec at weiser1] Timeout set to -1 (-1 means infinite)
>> > [mpiexec at weiser1] Got a control port string of weiser1:45851
>> >
>> > Proxy launch args: /mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy
>> > --control-port weiser1:45851 --debug --rmk user --launcher ssh --demux
>> poll
>> > --pgid 0 --retries 10 --usize -2 --proxy-id
>> >
>> > Arguments being passed to proxy 0:
>> > --version 3.0.4 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
>> > weiser1 --global-core-map 0,4,8 --pmi-id-map 0,0 --global-process-count
>> 8
>> > --auto-cleanup 1 --pmi-kvsname kvs_24541_0 --pmi-process-mapping
>> > (vector,(0,2,4)) --ckpoint-num -1 --global-inherited-env 20 'TERM=xterm'
>> > 'SHELL=/bin/bash'
>> >
>> 'XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422'
>> > 'SSH_CLIENT=192.168.0.3 57311 22'
>> > 'OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1' 'SSH_TTY=/dev/pts/0'
>> > 'USER=linaro'
>> >
>> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
>>
>>  5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
>> > 'LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib'
>> > 'MAIL=/var/mail/linaro'
>> >
>> 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin'
>> > 'PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a' 'LANG=C.UTF-8'
>> > 'SHLVL=1' 'HOME=/home/linaro' 'LOGNAME=linaro'
>> 'SSH_CONNECTION=192.168.0.3
>> > 57311 192.168.0.101 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
>> > 'LESSCLOSE=/usr/bin/lesspipe %s %s'
>> > '_=/mnt/nfs/install/mpich-install/bin/mpiexec' --global-user-env 0
>> > --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
>> > --proxy-core-count 4 --exec --exec-appnum 0 --exec-proc-count 4
>> > --exec-local-env 0 --exec-wdir
>> > /mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a --exec-args 1 ./xhpl
>> >
>> > Arguments being passed to proxy 1:
>> > --version 3.0.4 --iface-ip-env-name MPICH_INTERFACE_HOSTNAME --hostname
>> > weiser2 --global-core-map 0,4,8 --pmi-id-map 0,4 --global-process-count
>> 8
>> > --auto-cleanup 1 --pmi-kvsname kvs_24541_0 --pmi-process-mapping
>> > (vector,(0,2,4)) --ckpoint-num -1 --global-inherited-env 20 'TERM=xterm'
>> > 'SHELL=/bin/bash'
>> >
>> 'XDG_SESSION_COOKIE=218a1dd8e20ea6d6ec61475b00000019-1372384778.679329-1845893422'
>> > 'SSH_CLIENT=192.168.0.3 57311 22'
>> > 'OLDPWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1' 'SSH_TTY=/dev/pts/0'
>> > 'USER=linaro'
>> >
>> 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;3
>>
>>  5:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'
>> > 'LD_LIBRARY_PATH=:/mnt/nfs/install/mpich-install/lib'
>> > 'MAIL=/var/mail/linaro'
>> >
>> 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/mnt/nfs/install/mpich-install/bin'
>> > 'PWD=/mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a' 'LANG=C.UTF-8'
>> > 'SHLVL=1' 'HOME=/home/linaro' 'LOGNAME=linaro'
>> 'SSH_CONNECTION=192.168.0.3
>> > 57311 192.168.0.101 22' 'LESSOPEN=| /usr/bin/lesspipe %s'
>> > 'LESSCLOSE=/usr/bin/lesspipe %s %s'
>> > '_=/mnt/nfs/install/mpich-install/bin/mpiexec' --global-user-env 0
>> > --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
>> > --proxy-core-count 4 --exec --exec-appnum 0 --exec-proc-count 4
>> > --exec-local-env 0 --exec-wdir
>> > /mnt/nfs/jahanzeb/bench/hpl/hpl-2.1/bin/armv7-a --exec-args 1 ./xhpl
>> >
>> > [mpiexec at weiser1] Launch arguments:
>> > /mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy --control-port
>> > weiser1:45851 --debug --rmk user --launcher ssh --demux poll --pgid 0
>> > --retries 10 --usize -2 --proxy-id 0
>> > [mpiexec at weiser1] Launch arguments: /usr/bin/ssh -x weiser2
>> > "/mnt/nfs/install/mpich-install/bin/hydra_pmi_proxy" --control-port
>> > weiser1:45851 --debug --rmk user --launcher ssh --demux poll --pgid 0
>> > --retries 10 --usize -2 --proxy-id 1
>> > [proxy:0:0 at weiser1] got pmi command (from 0): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 15): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 8): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:0 at weiser1] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 0): put
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] cached command:
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 0): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_maxes
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 15): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_appnum
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=appnum appnum=0
>> > [proxy:0:0 at weiser1] got pmi command (from 8): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get_my_kvsname
>> >
>> > [proxy:0:0 at weiser1] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:0 at weiser1] got pmi command (from 6): barrier_in
>> >
>> > [proxy:0:0 at weiser1] flushing 1 put command(s) out
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] forwarding command (cmd=put
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9) upstream
>> > [proxy:0:0 at weiser1] forwarding command (cmd=barrier_in) upstream
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [proxy:0:1 at weiser2] got pmi command (from 7): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 4): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 7): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): put
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] cached command:
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 4): barrier_in
>> >
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=keyval_cache
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=keyval_cache
>> > sharedFilename[0]=/dev/shm/mpich_shar_tmpnEZdQ9
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=barrier_out
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 10): init
>> > pmi_version=1 pmi_subversion=1
>> > [proxy:0:1 at weiser2] PMI response: cmd=response_to_init pmi_version=1
>> > pmi_subversion=1 rc=0
>> > [proxy:0:1 at weiser2] got pmi command (from 5): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_maxes
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=maxes kvsname_max=256
>> keylen_max=64
>> > vallen_max=1024
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_appnum
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=appnum appnum=0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get_my_kvsname
>> >
>> > [proxy:0:1 at weiser2] PMI response: cmd=my_kvsname kvsname=kvs_24541_0
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get
>> > kvsname=kvs_24541_0 key=PMI_process_mapping
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=(vector,(0,2,4))
>> > [proxy:0:1 at weiser2] got pmi command (from 10): barrier_in
>> >
>> > [proxy:0:1 at weiser2] flushing 1 put command(s) out
>> > [proxy:0:1 at weiser2] forwarding command (cmd=put
>> > sharedFilename[4]=/dev/shm/mpich_shar_tmpuKzlSa) upstream
>> > [proxy:0:1 at weiser2] forwarding command (cmd=barrier_in) upstream
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] got pmi command (from 7): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:1 at weiser2] got pmi command (from 10): get
>> > kvsname=kvs_24541_0 key=sharedFilename[4]
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpuKzlSa
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=sharedFilename[0]
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=/dev/shm/mpich_shar_tmpnEZdQ9
>> > [proxy:0:0 at weiser1] got pmi command (from 0): put
>> > kvsname=kvs_24541_0 key=P0-businesscard
>> > value=description#weiser1$port#56190$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 8): put
>> > kvsname=kvs_24541_0 key=P2-businesscard
>> > value=description#weiser1$port#40019$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 15): put
>> > kvsname=kvs_24541_0 key=P3-businesscard
>> > value=description#weiser1$port#57150$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 0): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): put
>> > kvsname=kvs_24541_0 key=P1-businesscard
>> > value=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] cached command:
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:0 at weiser1] got pmi command (from 8): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 6): barrier_in
>> >
>> > [proxy:0:0 at weiser1] got pmi command (from 15): barrier_in
>> >
>> > [proxy:0:0 at weiser1] flushing 4 put command(s) out
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=put
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > [proxy:0:0 at weiser1] forwarding command (cmd=put
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$)
>> > upstream
>> > [proxy:0:0 at weiser1] forwarding command (cmd=barrier_in) upstream
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [proxy:0:1 at weiser2] got pmi command (from 4): put
>> > kvsname=kvs_24541_0 key=P4-businesscard
>> > value=description#weiser2$port#60693$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 5): put
>> > kvsname=kvs_24541_0 key=P5-businesscard
>> > value=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 7): put
>> > kvsname=kvs_24541_0 key=P6-businesscard
>> > value=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 10): put
>> > kvsname=kvs_24541_0 key=P7-businesscard
>> > value=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] cached command:
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] [mpiexec at weiser1] [pgid: 0] got PMI command:
>> cmd=put
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > PMI response: cmd=put_result rc=0 msg=success
>> > [proxy:0:1 at weiser2] got pmi command (from 4): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 5): barrier_in
>> >
>> > [proxy:0:1 at weiser2] got pmi command (from 7): barrier_in
>> > [mpiexec at weiser1] [pgid: 0] got PMI command: cmd=barrier_in
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=keyval_cache
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=keyval_cache
>> > P0-businesscard=description#weiser1$port#56190$ifname#192.168.0.101$
>> > P2-businesscard=description#weiser1$port#40019$ifname#192.168.0.101$
>> > P3-businesscard=description#weiser1$port#57150$ifname#192.168.0.101$
>> > P1-businesscard=description#weiser1$port#34048$ifname#192.168.0.101$
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [mpiexec at weiser1] PMI response to fd 6 pid 10: cmd=barrier_out
>> > [mpiexec at weiser1] PMI response to fd 7 pid 10: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1]
>> > [proxy:0:1 at weiser2] got pmi command (from 10): barrier_in
>> >
>> > [proxy:0:1 at weiser2] flushing 4 put command(s) out
>> > [proxy:0:1 at weiser2] forwarding command (cmd=put
>> > P4-businesscard=description#weiser2$port#60693$ifname#192.168.0.102$
>> > P5-businesscard=description#weiser2$port#49938$ifname#192.168.0.102$
>> > P6-businesscard=description#weiser2$port#33516$ifname#192.168.0.102$
>> > P7-businesscard=description#weiser2$port#43116$ifname#192.168.0.102$)
>> > upstream
>> > [proxy:0:1 at weiser2] forwarding command (cmd=barrier_in) upstream
>> > PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:0 at weiser1] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] PMI response: cmd=barrier_out
>> > [proxy:0:1 at weiser2] got pmi command (from 4): get
>> > kvsname=kvs_24541_0 key=P0-businesscard
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser1$port#56190$ifname#192.168.0.101$
>> >
>> ================================================================================
>> > HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26,
>> 2012
>> > Written by A. Petitet and R. Clint Whaley,  Innovative Computing
>> Laboratory,
>> > UTK
>> > Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
>> > Modified by Julien Langou, University of Colorado Denver
>> >
>> ================================================================================
>> >
>> > An explanation of the input/output parameters follows:
>> > T/V    : Wall time / encoded variant.
>> > N      : The order of the coefficient matrix A.
>> > NB     : The partitioning blocking factor.
>> > P      : The number of process rows.
>> > Q      : The number of process columns.
>> > Time   : Time in seconds to solve the linear system.
>> > Gflops : Rate of execution for solving the linear system.
>> >
>> > The following parameter values will be used:
>> >
>> > N      :   14616
>> > NB     :     168
>> > PMAP   : Row-major process mapping
>> > P      :       2
>> > Q      :       4
>> > PFACT  :   Right
>> > NBMIN  :       4
>> > NDIV   :       2
>> > RFACT  :   Crout
>> > BCAST  :  1ringM
>> > DEPTH  :       1
>> > SWAP   : Mix (threshold = 64)
>> > L1     : transposed form
>> > U      : transposed form
>> > EQUIL  : yes
>> > ALIGN  : 8 double precision words
>> >
>> >
>> --------------------------------------------------------------------------------
>> >
>> > - The matrix A is randomly generated for each test.
>> > - The following scaled residual check will be computed:
>> >       ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) *
>> N )
>> > - The relative machine precision (eps) is taken to be
>> > 1.110223e-16
>> > [proxy:0:0 at weiser1] got pmi command (from 6): get
>> > - Computational tests pass if scaled residuals are less than
>> > 16.0
>> >
>> > kvsname=kvs_24541_0 key=P5-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#49938$ifname#192.168.0.102$
>> > [proxy:0:0 at weiser1] got pmi command (from 15): get
>> > kvsname=kvs_24541_0 key=P7-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#43116$ifname#192.168.0.102$
>> > [proxy:0:0 at weiser1] got pmi command (from 8): get
>> > kvsname=kvs_24541_0 key=P6-businesscard
>> > [proxy:0:0 at weiser1] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser2$port#33516$ifname#192.168.0.102$
>> > [proxy:0:1 at weiser2] got pmi command (from 5): get
>> > kvsname=kvs_24541_0 key=P1-businesscard
>> > [proxy:0:1 at weiser2] PMI response: cmd=get_result rc=0 msg=success
>> > value=description#weiser1$port#34048$ifname#192.168.0.101$
>> >
>> >
>> ===================================================================================
>> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > =   EXIT CODE: 9
>> > =   CLEANING UP REMAINING PROCESSES
>> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >
>> ===================================================================================
>> >
>> >
>> > ----------- END --------------
>> >
>> > if that can help :(
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 28, 2013 at 12:24 PM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> >>
>> >>
>> >> Looks like your application aborted for some reason.
>> >>
>> >>  -- Pavan
>> >>
>> >>
>> >> On 06/27/2013 10:21 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>> >>>
>> >>> My bad, I just found out that there was a duplicate entry like:
>> >>> weiser1 127.0.1.1
>> >>> weiser1 192.168.0.101
>> >>> so i removed teh 127.x.x.x. entry and kept the hostfile contents
>> similar
>> >>> on both nodes. Now previous error is reduced to this one:
>> >>>
>> >>> ------ START OF OUTPUT -------
>> >>>
>> >>> ....some HPL startup string (no final result)
>> >>> ...skip.....
>> >>>
>> >>>
>> >>>
>> ===================================================================================
>> >>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >>> =   EXIT CODE: 9
>> >>> =   CLEANING UP REMAINING PROCESSES
>> >>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >>>
>> >>>
>> ===================================================================================
>> >>> [proxy:0:0 at weiser1] HYD_pmcd_pmip_control_cmd_cb
>> >>> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
>> >>> [proxy:0:0 at weiser1] HYDT_dmxu_poll_wait_for_event
>> >>> (./tools/demux/demux_poll.c:77): callback returned error status
>> >>> [proxy:0:0 at weiser1] main (./pm/pmiserv/pmip.c:206): demux engine
>> error
>> >>> waiting for event
>> >>> [mpiexec at weiser1] HYDT_bscu_wait_for_completion
>> >>> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>> >>> terminated badly; aborting
>> >>> [mpiexec at weiser1] HYDT_bsci_wait_for_completion
>> >>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>> waiting
>> >>> for completion
>> >>> [mpiexec at weiser1] HYD_pmci_wait_for_completion
>> >>> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
>> >>> completion
>> >>> [mpiexec at weiser1] main (./ui/mpich/mpiexec.c:331): process manager
>> error
>> >>> waiting for completion
>> >>>
>> >>> ------ END OF OUTPUT -------
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jun 28, 2013 at 12:12 PM, Pavan Balaji <balaji at mcs.anl.gov
>> >>> <mailto:balaji at mcs.anl.gov>> wrote:
>> >>>
>> >>>
>> >>>     On 06/27/2013 10:08 PM, Syed. Jahanzeb Maqbool Hashmi wrote:
>> >>>
>> >>>
>> >>>
>> P4-businesscard=description#__weiser2$port#57651$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P5-businesscard=description#__weiser2$port#52622$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P6-businesscard=description#__weiser2$port#55935$ifname#192.__168.0.102$
>> >>>
>> >>>
>> P7-businesscard=description#__weiser2$port#54952$ifname#192.__168.0.102$
>> >>>
>> >>> P0-businesscard=description#__weiser1$port#41958$ifname#127.__0.1.1$
>> >>>
>> >>> P2-businesscard=description#__weiser1$port#35049$ifname#127.__0.1.1$
>> >>>
>> >>> P1-businesscard=description#__weiser1$port#39634$ifname#127.__0.1.1$
>> >>>
>> >>> P3-businesscard=description#__weiser1$port#51802$ifname#127.__0.1.1$
>> >>>
>> >>>
>> >>>
>> >>>     I have two concerns with your output.  Let's start with the first.
>> >>>
>> >>>     Did you look at this question on the FAQ page?
>> >>>
>> >>>     "Is your /etc/hosts file consistent across all nodes? Unless you
>> are
>> >>>     using an external DNS server, the /etc/hosts file on every machine
>> >>>     should contain the correct IP information about all hosts in the
>> >>>     system."
>> >>>
>> >>>
>> >>>       -- Pavan
>> >>>
>> >>>     --
>> >>>     Pavan Balaji
>> >>>     http://www.mcs.anl.gov/~balaji
>> >>>
>> >>>
>> >>
>> >> --
>> >> Pavan Balaji
>> >> http://www.mcs.anl.gov/~balaji
>> >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130627/4147c9fa/attachment.html>


More information about the discuss mailing list