[mpich-discuss] Error while using MPI_Send

Doha Ehab dohaehab at gmail.com
Tue Jul 26 12:01:01 CDT 2016


changing the hostname and adding it along with IP address in /etc/hosts
made it work.

Thanks
Doha

On Mon, Jul 25, 2016 at 8:41 PM, Balaji, Pavan <balaji at anl.gov> wrote:

>
> This sometimes happens if the node cannot resolve its hostname itself.
> Can you try adding your hostname and IP to /etc/hosts?
>
>   - Pavan
>
> > On Jul 25, 2016, at 1:36 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov>
> wrote:
> >
> > Hi,
> >
> > Normally this type of error would be the result of a firewall blocking
> communication, but your run is on a single node so that shouldn't be the
> case. I wonder, since it did work at one point, if there is some bad state
> on your system that might be cleared with a reboot?
> >
> > It also looks like you are running on a Android system, which we do not
> have much experience with, and no good way to test ourselves so our ways to
> help may be limited.
> >
> > Ken
> >
> > On 07/24/2016 11:50 AM, Doha Ehab wrote:
> >> Hello
> >>  I am using a cross compiled version of MPICH3, I was trying simple
> >> containing MPI_Send and MPI_Recv, it was working but suddenly I keep
> >> receiving this error messsage: can anyone point out what is wrong and
> >> how to fix it.
> >>
> >>  $ mpiexec -v -n 4 /data/parallelCode
> >> host: tab
> >>
> >>
> ==================================================================================================
> >> mpiexec options:
> >> ----------------
> >>   Base path: /system/xbin/
> >>   Launcher: (null)
> >>   Debug level: 1
> >>   Enable X: -1
> >>
> >>   Global environment:
> >>   -------------------
> >>     _=/system/xbin/mpiexec
> >>     PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
> >>     LOOP_MOUNTPOINT=/mnt/obb
> >>     ANDROID_ROOT=/system
> >>     SHELL=/system/bin/sh
> >>     ANDROID_DATA=/data
> >>     ANDROID_ASSETS=/system/app
> >>     TERM=vt100
> >>     ANDROID_PROPERTY_WORKSPACE=8,0
> >>     ANDROID_BOOTLOGO=1
> >>     HOSTNAME=hwt1701
> >>     LD_LIBRARY_PATH=/vendor/lib:/system/lib
> >>
> >>
> BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar
> >>     EMULATED_STORAGE_SOURCE=/mnt/shell/emulated
> >>     ANDROID_SOCKET_adbd=10
> >>     EMULATED_STORAGE_TARGET=/storage/emulated
> >>     ANDROID_STORAGE=/storage
> >>     MKSH=/system/bin/sh
> >>     EXTERNAL_STORAGE=/storage/emulated/legacy
> >>     USBHOST_STORAGE=/storage/usbdisk
> >>     RANDOM=11338
> >>     ASEC_MOUNTPOINT=/mnt/asec
> >>     SECONDARY_STORAGE=/storage/sdcard1
> >>     USER=shell
> >>     LEGACY_STORAGE=/storage/emulated/legacy
> >>     HOME=/data
> >>
> >>   Hydra internal environment:
> >>   ---------------------------
> >>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
> >>
> >>
> >>     Proxy information:
> >>     *********************
> >>       [1] proxy: tab (1 cores)
> >>       Exec list: /data/mmp100 (4 processes);
> >>
> >>
> >>
> ==================================================================================================
> >>
> >> [mpiexec at tab] Timeout set to -1 (-1 means infinite)
> >> [mpiexec at tab] Got a control port string of tab:48661
> >>
> >> Proxy launch args: /system/xbin/hydra_pmi_proxy --control-port tab:48661
> >> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> >> --usize -2 --proxy-id
> >>
> >> Arguments being passed to proxy 0:
> >> --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> >> --hostname tab --global-core-map 0,1,1 --pmi-id-map 0,0
> >> --global-process-count 4 --auto-cleanup 1 --pmi-kvsname kvs_10003_0
> >> --pmi-process-mapping (vector,(0,1,1)) --ckpoint-num -1
> >> --global-inherited-env 26 '_=/system/xbin/mpiexec'
> >> 'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
> >> 'LOOP_MOUNTPOINT=/mnt/obb' 'ANDROID_ROOT=/system' 'SHELL=/system/bin/sh'
> >> 'ANDROID_DATA=/data' 'ANDROID_ASSETS=/system/app' 'TERM=vt100'
> >> 'ANDROID_PROPERTY_WORKSPACE=8,0' 'ANDROID_BOOTLOGO=1' 'HOSTNAME=hwt1701'
> >> 'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
> >>
> 'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar'
> >> 'EMULATED_STORAGE_SOURCE=/mnt/shell/emulated' 'ANDROID_SOCKET_adbd=10'
> >> 'EMULATED_STORAGE_TARGET=/storage/emulated' 'ANDROID_STORAGE=/storage'
> >> 'MKSH=/system/bin/sh' 'EXTERNAL_STORAGE=/storage/emulated/legacy'
> >> 'USBHOST_STORAGE=/storage/usbdisk' 'RANDOM=11338'
> >> 'ASEC_MOUNTPOINT=/mnt/asec' 'SECONDARY_STORAGE=/storage/sdcard1'
> >> 'USER=shell' 'LEGACY_STORAGE=/storage/emulated/legacy' 'HOME=/data'
> >> --global-user-env 0 --global-system-env 1
> >> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> >> --exec-appnum 0 --exec-proc-count 4 --exec-local-env 0 --exec-wdir /
> >> --exec-args 1 /data/mmp100
> >>
> >> [mpiexec at tab] Launch arguments: /system/xbin/hydra_pmi_proxy
> >> --control-port tab:48661 --debug --rmk user --launcher ssh --demux poll
> >> --pgid 0 --retries 10 --usize -2 --proxy-id 0
> >> [proxy:0:0 at tab] got pmi command (from 0): init
> >> pmi_version=1 pmi_subversion=1
> >> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> >> pmi_subversion=1 rc=0
> >> [proxy:0:0 at tab] got pmi command (from 0): get_maxes
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> >> vallen_max=1024
> >> [proxy:0:0 at tab] got pmi command (from 6): init
> >> pmi_version=1 pmi_subversion=1
> >> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> >> pmi_subversion=1 rc=0
> >> [proxy:0:0 at tab] got pmi command (from 6): get_maxes
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> >> vallen_max=1024
> >> [proxy:0:0 at tab] got pmi command (from 9): init
> >> pmi_version=1 pmi_subversion=1
> >> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> >> pmi_subversion=1 rc=0
> >> [proxy:0:0 at tab] got pmi command (from 15): init
> >> pmi_version=1 pmi_subversion=1
> >> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> >> pmi_subversion=1 rc=0
> >> [proxy:0:0 at tab] got pmi command (from 0): get_appnum
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> >> [proxy:0:0 at tab] got pmi command (from 9): get_maxes
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> >> vallen_max=1024
> >> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 15): get_maxes
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> >> vallen_max=1024
> >> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 9): get_appnum
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> >> [proxy:0:0 at tab] got pmi command (from 0): get
> >> kvsname=kvs_10003_0 key=PMI_process_mapping
> >> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> >> value=(vector,(0,1,1))
> >> [proxy:0:0 at tab] got pmi command (from 15): get_appnum
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> >> [proxy:0:0 at tab] got pmi command (from 6): get_appnum
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> >> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 0): put
> >> kvsname=kvs_10003_0 key=P0-businesscard
> >> value=port#49751$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] cached command:
> >> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> >> [proxy:0:0 at tab] got pmi command (from 9): get
> >> kvsname=kvs_10003_0 key=PMI_process_mapping
> >> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> >> value=(vector,(0,1,1))
> >> [proxy:0:0 at tab] got pmi command (from 0): barrier_in
> >>
> >> [proxy:0:0 at tab] got pmi command (from 6): get
> >> kvsname=kvs_10003_0 key=PMI_process_mapping
> >> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> >> value=(vector,(0,1,1))
> >> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
> >>
> >> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> >> [proxy:0:0 at tab] got pmi command (from 9): put
> >> kvsname=kvs_10003_0 key=P2-businesscard
> >> value=port#60729$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] cached command:
> >> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> >> [proxy:0:0 at tab] got pmi command (from 15): get
> >> kvsname=kvs_10003_0 key=PMI_process_mapping
> >> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> >> value=(vector,(0,1,1))
> >> [proxy:0:0 at tab] got pmi command (from 9): barrier_in
> >>
> >> [proxy:0:0 at tab] got pmi command (from 6): put
> >> kvsname=kvs_10003_0 key=P1-businesscard
> >> value=port#44344$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] cached command:
> >> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> >> [proxy:0:0 at tab] got pmi command (from 15): put
> >> kvsname=kvs_10003_0 key=P3-businesscard
> >> value=port#51326$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] cached command:
> >> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> >> [proxy:0:0 at tab] got pmi command (from 6): barrier_in
> >>
> >> [proxy:0:0 at tab] got pmi command (from 15): barrier_in
> >>
> >> [proxy:0:0 at tab] flushing 4 put command(s) out
> >> [mpiexec at tab] [pgid: 0] got PMI command: cmd=put
> >> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> >> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> >> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> >> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> >> [proxy:0:0 at tab] forwarding command (cmd=put
> >> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> >> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> >> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> >> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$) upstream
> >> [proxy:0:0 at tab] forwarding command (cmd=barrier_in) upstream
> >> [mpiexec at tab] [pgid: 0] got PMI command: cmd=barrier_in
> >> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=keyval_cache
> >> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> >> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> >> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> >> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> >> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=barrier_out
> >> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> >> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> >> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> >> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> >> [proxy:0:0 at tab] got pmi command (from 0): get
> >> kvsname=kvs_10003_0 key=P1-businesscard
> >> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> >> value=port#44344$description#tab$ifname#192.168.1.4$
> >> Fatal error in MPI_Send: Unknown error class, error stack:
> >> MPI_Send(174)...............................: MPI_Send(buf=0x15c56c,
> >> count=1, MPI_INT, dest=1, tag=1, MPI_COMM_WORLD) failed
> >> MPIDI_CH3i_Progress_wait(242)...............: an error occurred while
> >> handling an event returned by MPIDU_Sock_Wait()
> >> MPIDI_CH3I_Progress_handle_sock_event(697)..:
> >> MPIDI_CH3_Sockconn_handle_connect_event(597): [ch3:sock] failed to
> >> connnect to remote process
> >> MPIDU_Socki_handle_connect(808).............: connection failure
> >> (set=0,sock=1,errno=113:No route to host)
> >> [proxy:0:0 at tab] got pmi command (from 0): abort
> >> exitcode=69331543
> >> [proxy:0:0 at tab] we don't understand this command abort; forwarding
> upstream
> >> [mpiexec at tab] [pgid: 0] got PMI command: cmd=abort exitcode=69331543
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160726/a37ba41b/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list