[mpich-discuss] Error while using MPI_Send

Balaji, Pavan balaji at anl.gov
Mon Jul 25 13:41:30 CDT 2016


This sometimes happens if the node cannot resolve its hostname itself.  Can you try adding your hostname and IP to /etc/hosts?

  - Pavan

> On Jul 25, 2016, at 1:36 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:
> 
> Hi,
> 
> Normally this type of error would be the result of a firewall blocking communication, but your run is on a single node so that shouldn't be the case. I wonder, since it did work at one point, if there is some bad state on your system that might be cleared with a reboot?
> 
> It also looks like you are running on a Android system, which we do not have much experience with, and no good way to test ourselves so our ways to help may be limited.
> 
> Ken
> 
> On 07/24/2016 11:50 AM, Doha Ehab wrote:
>> Hello
>>  I am using a cross compiled version of MPICH3, I was trying simple
>> containing MPI_Send and MPI_Recv, it was working but suddenly I keep
>> receiving this error messsage: can anyone point out what is wrong and
>> how to fix it.
>> 
>>  $ mpiexec -v -n 4 /data/parallelCode
>> host: tab
>> 
>> ==================================================================================================
>> mpiexec options:
>> ----------------
>>   Base path: /system/xbin/
>>   Launcher: (null)
>>   Debug level: 1
>>   Enable X: -1
>> 
>>   Global environment:
>>   -------------------
>>     _=/system/xbin/mpiexec
>>     PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
>>     LOOP_MOUNTPOINT=/mnt/obb
>>     ANDROID_ROOT=/system
>>     SHELL=/system/bin/sh
>>     ANDROID_DATA=/data
>>     ANDROID_ASSETS=/system/app
>>     TERM=vt100
>>     ANDROID_PROPERTY_WORKSPACE=8,0
>>     ANDROID_BOOTLOGO=1
>>     HOSTNAME=hwt1701
>>     LD_LIBRARY_PATH=/vendor/lib:/system/lib
>> 
>> BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar
>>     EMULATED_STORAGE_SOURCE=/mnt/shell/emulated
>>     ANDROID_SOCKET_adbd=10
>>     EMULATED_STORAGE_TARGET=/storage/emulated
>>     ANDROID_STORAGE=/storage
>>     MKSH=/system/bin/sh
>>     EXTERNAL_STORAGE=/storage/emulated/legacy
>>     USBHOST_STORAGE=/storage/usbdisk
>>     RANDOM=11338
>>     ASEC_MOUNTPOINT=/mnt/asec
>>     SECONDARY_STORAGE=/storage/sdcard1
>>     USER=shell
>>     LEGACY_STORAGE=/storage/emulated/legacy
>>     HOME=/data
>> 
>>   Hydra internal environment:
>>   ---------------------------
>>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>> 
>> 
>>     Proxy information:
>>     *********************
>>       [1] proxy: tab (1 cores)
>>       Exec list: /data/mmp100 (4 processes);
>> 
>> 
>> ==================================================================================================
>> 
>> [mpiexec at tab] Timeout set to -1 (-1 means infinite)
>> [mpiexec at tab] Got a control port string of tab:48661
>> 
>> Proxy launch args: /system/xbin/hydra_pmi_proxy --control-port tab:48661
>> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
>> --usize -2 --proxy-id
>> 
>> Arguments being passed to proxy 0:
>> --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
>> --hostname tab --global-core-map 0,1,1 --pmi-id-map 0,0
>> --global-process-count 4 --auto-cleanup 1 --pmi-kvsname kvs_10003_0
>> --pmi-process-mapping (vector,(0,1,1)) --ckpoint-num -1
>> --global-inherited-env 26 '_=/system/xbin/mpiexec'
>> 'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
>> 'LOOP_MOUNTPOINT=/mnt/obb' 'ANDROID_ROOT=/system' 'SHELL=/system/bin/sh'
>> 'ANDROID_DATA=/data' 'ANDROID_ASSETS=/system/app' 'TERM=vt100'
>> 'ANDROID_PROPERTY_WORKSPACE=8,0' 'ANDROID_BOOTLOGO=1' 'HOSTNAME=hwt1701'
>> 'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
>> 'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar'
>> 'EMULATED_STORAGE_SOURCE=/mnt/shell/emulated' 'ANDROID_SOCKET_adbd=10'
>> 'EMULATED_STORAGE_TARGET=/storage/emulated' 'ANDROID_STORAGE=/storage'
>> 'MKSH=/system/bin/sh' 'EXTERNAL_STORAGE=/storage/emulated/legacy'
>> 'USBHOST_STORAGE=/storage/usbdisk' 'RANDOM=11338'
>> 'ASEC_MOUNTPOINT=/mnt/asec' 'SECONDARY_STORAGE=/storage/sdcard1'
>> 'USER=shell' 'LEGACY_STORAGE=/storage/emulated/legacy' 'HOME=/data'
>> --global-user-env 0 --global-system-env 1
>> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
>> --exec-appnum 0 --exec-proc-count 4 --exec-local-env 0 --exec-wdir /
>> --exec-args 1 /data/mmp100
>> 
>> [mpiexec at tab] Launch arguments: /system/xbin/hydra_pmi_proxy
>> --control-port tab:48661 --debug --rmk user --launcher ssh --demux poll
>> --pgid 0 --retries 10 --usize -2 --proxy-id 0
>> [proxy:0:0 at tab] got pmi command (from 0): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 0): get_maxes
>> 
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 6): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 6): get_maxes
>> 
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 9): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 15): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 0): get_appnum
>> 
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 9): get_maxes
>> 
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 15): get_maxes
>> 
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): get_appnum
>> 
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 0): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 15): get_appnum
>> 
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 6): get_appnum
>> 
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 0): put
>> kvsname=kvs_10003_0 key=P0-businesscard
>> value=port#49751$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 9): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 0): barrier_in
>> 
>> [proxy:0:0 at tab] got pmi command (from 6): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>> 
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): put
>> kvsname=kvs_10003_0 key=P2-businesscard
>> value=port#60729$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 15): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 9): barrier_in
>> 
>> [proxy:0:0 at tab] got pmi command (from 6): put
>> kvsname=kvs_10003_0 key=P1-businesscard
>> value=port#44344$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 15): put
>> kvsname=kvs_10003_0 key=P3-businesscard
>> value=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 6): barrier_in
>> 
>> [proxy:0:0 at tab] got pmi command (from 15): barrier_in
>> 
>> [proxy:0:0 at tab] flushing 4 put command(s) out
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=put
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] forwarding command (cmd=put
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$) upstream
>> [proxy:0:0 at tab] forwarding command (cmd=barrier_in) upstream
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=barrier_in
>> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=keyval_cache
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] got pmi command (from 0): get
>> kvsname=kvs_10003_0 key=P1-businesscard
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=port#44344$description#tab$ifname#192.168.1.4$
>> Fatal error in MPI_Send: Unknown error class, error stack:
>> MPI_Send(174)...............................: MPI_Send(buf=0x15c56c,
>> count=1, MPI_INT, dest=1, tag=1, MPI_COMM_WORLD) failed
>> MPIDI_CH3i_Progress_wait(242)...............: an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(697)..:
>> MPIDI_CH3_Sockconn_handle_connect_event(597): [ch3:sock] failed to
>> connnect to remote process
>> MPIDU_Socki_handle_connect(808).............: connection failure
>> (set=0,sock=1,errno=113:No route to host)
>> [proxy:0:0 at tab] got pmi command (from 0): abort
>> exitcode=69331543
>> [proxy:0:0 at tab] we don't understand this command abort; forwarding upstream
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=abort exitcode=69331543
>> 
>> 
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list