[mpich-discuss] Error while using MPI_Send
Balaji, Pavan
balaji at anl.gov
Mon Jul 25 13:41:30 CDT 2016
This sometimes happens if the node cannot resolve its hostname itself. Can you try adding your hostname and IP to /etc/hosts?
- Pavan
> On Jul 25, 2016, at 1:36 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:
>
> Hi,
>
> Normally this type of error would be the result of a firewall blocking communication, but your run is on a single node so that shouldn't be the case. I wonder, since it did work at one point, if there is some bad state on your system that might be cleared with a reboot?
>
> It also looks like you are running on a Android system, which we do not have much experience with, and no good way to test ourselves so our ways to help may be limited.
>
> Ken
>
> On 07/24/2016 11:50 AM, Doha Ehab wrote:
>> Hello
>> I am using a cross compiled version of MPICH3, I was trying simple
>> containing MPI_Send and MPI_Recv, it was working but suddenly I keep
>> receiving this error messsage: can anyone point out what is wrong and
>> how to fix it.
>>
>> $ mpiexec -v -n 4 /data/parallelCode
>> host: tab
>>
>> ==================================================================================================
>> mpiexec options:
>> ----------------
>> Base path: /system/xbin/
>> Launcher: (null)
>> Debug level: 1
>> Enable X: -1
>>
>> Global environment:
>> -------------------
>> _=/system/xbin/mpiexec
>> PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
>> LOOP_MOUNTPOINT=/mnt/obb
>> ANDROID_ROOT=/system
>> SHELL=/system/bin/sh
>> ANDROID_DATA=/data
>> ANDROID_ASSETS=/system/app
>> TERM=vt100
>> ANDROID_PROPERTY_WORKSPACE=8,0
>> ANDROID_BOOTLOGO=1
>> HOSTNAME=hwt1701
>> LD_LIBRARY_PATH=/vendor/lib:/system/lib
>>
>> BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar
>> EMULATED_STORAGE_SOURCE=/mnt/shell/emulated
>> ANDROID_SOCKET_adbd=10
>> EMULATED_STORAGE_TARGET=/storage/emulated
>> ANDROID_STORAGE=/storage
>> MKSH=/system/bin/sh
>> EXTERNAL_STORAGE=/storage/emulated/legacy
>> USBHOST_STORAGE=/storage/usbdisk
>> RANDOM=11338
>> ASEC_MOUNTPOINT=/mnt/asec
>> SECONDARY_STORAGE=/storage/sdcard1
>> USER=shell
>> LEGACY_STORAGE=/storage/emulated/legacy
>> HOME=/data
>>
>> Hydra internal environment:
>> ---------------------------
>> GFORTRAN_UNBUFFERED_PRECONNECTED=y
>>
>>
>> Proxy information:
>> *********************
>> [1] proxy: tab (1 cores)
>> Exec list: /data/mmp100 (4 processes);
>>
>>
>> ==================================================================================================
>>
>> [mpiexec at tab] Timeout set to -1 (-1 means infinite)
>> [mpiexec at tab] Got a control port string of tab:48661
>>
>> Proxy launch args: /system/xbin/hydra_pmi_proxy --control-port tab:48661
>> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
>> --usize -2 --proxy-id
>>
>> Arguments being passed to proxy 0:
>> --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
>> --hostname tab --global-core-map 0,1,1 --pmi-id-map 0,0
>> --global-process-count 4 --auto-cleanup 1 --pmi-kvsname kvs_10003_0
>> --pmi-process-mapping (vector,(0,1,1)) --ckpoint-num -1
>> --global-inherited-env 26 '_=/system/xbin/mpiexec'
>> 'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
>> 'LOOP_MOUNTPOINT=/mnt/obb' 'ANDROID_ROOT=/system' 'SHELL=/system/bin/sh'
>> 'ANDROID_DATA=/data' 'ANDROID_ASSETS=/system/app' 'TERM=vt100'
>> 'ANDROID_PROPERTY_WORKSPACE=8,0' 'ANDROID_BOOTLOGO=1' 'HOSTNAME=hwt1701'
>> 'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
>> 'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar'
>> 'EMULATED_STORAGE_SOURCE=/mnt/shell/emulated' 'ANDROID_SOCKET_adbd=10'
>> 'EMULATED_STORAGE_TARGET=/storage/emulated' 'ANDROID_STORAGE=/storage'
>> 'MKSH=/system/bin/sh' 'EXTERNAL_STORAGE=/storage/emulated/legacy'
>> 'USBHOST_STORAGE=/storage/usbdisk' 'RANDOM=11338'
>> 'ASEC_MOUNTPOINT=/mnt/asec' 'SECONDARY_STORAGE=/storage/sdcard1'
>> 'USER=shell' 'LEGACY_STORAGE=/storage/emulated/legacy' 'HOME=/data'
>> --global-user-env 0 --global-system-env 1
>> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
>> --exec-appnum 0 --exec-proc-count 4 --exec-local-env 0 --exec-wdir /
>> --exec-args 1 /data/mmp100
>>
>> [mpiexec at tab] Launch arguments: /system/xbin/hydra_pmi_proxy
>> --control-port tab:48661 --debug --rmk user --launcher ssh --demux poll
>> --pgid 0 --retries 10 --usize -2 --proxy-id 0
>> [proxy:0:0 at tab] got pmi command (from 0): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 0): get_maxes
>>
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 6): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 6): get_maxes
>>
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 9): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 15): init
>> pmi_version=1 pmi_subversion=1
>> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
>> pmi_subversion=1 rc=0
>> [proxy:0:0 at tab] got pmi command (from 0): get_appnum
>>
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 9): get_maxes
>>
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 15): get_maxes
>>
>> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
>> vallen_max=1024
>> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): get_appnum
>>
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 0): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 15): get_appnum
>>
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 6): get_appnum
>>
>> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
>> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 0): put
>> kvsname=kvs_10003_0 key=P0-businesscard
>> value=port#49751$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 9): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 0): barrier_in
>>
>> [proxy:0:0 at tab] got pmi command (from 6): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>>
>> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
>> [proxy:0:0 at tab] got pmi command (from 9): put
>> kvsname=kvs_10003_0 key=P2-businesscard
>> value=port#60729$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 15): get
>> kvsname=kvs_10003_0 key=PMI_process_mapping
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=(vector,(0,1,1))
>> [proxy:0:0 at tab] got pmi command (from 9): barrier_in
>>
>> [proxy:0:0 at tab] got pmi command (from 6): put
>> kvsname=kvs_10003_0 key=P1-businesscard
>> value=port#44344$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 15): put
>> kvsname=kvs_10003_0 key=P3-businesscard
>> value=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] cached command:
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
>> [proxy:0:0 at tab] got pmi command (from 6): barrier_in
>>
>> [proxy:0:0 at tab] got pmi command (from 15): barrier_in
>>
>> [proxy:0:0 at tab] flushing 4 put command(s) out
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=put
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [proxy:0:0 at tab] forwarding command (cmd=put
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$) upstream
>> [proxy:0:0 at tab] forwarding command (cmd=barrier_in) upstream
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=barrier_in
>> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=keyval_cache
>> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
>> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
>> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
>> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
>> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] PMI response: cmd=barrier_out
>> [proxy:0:0 at tab] got pmi command (from 0): get
>> kvsname=kvs_10003_0 key=P1-businesscard
>> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
>> value=port#44344$description#tab$ifname#192.168.1.4$
>> Fatal error in MPI_Send: Unknown error class, error stack:
>> MPI_Send(174)...............................: MPI_Send(buf=0x15c56c,
>> count=1, MPI_INT, dest=1, tag=1, MPI_COMM_WORLD) failed
>> MPIDI_CH3i_Progress_wait(242)...............: an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(697)..:
>> MPIDI_CH3_Sockconn_handle_connect_event(597): [ch3:sock] failed to
>> connnect to remote process
>> MPIDU_Socki_handle_connect(808).............: connection failure
>> (set=0,sock=1,errno=113:No route to host)
>> [proxy:0:0 at tab] got pmi command (from 0): abort
>> exitcode=69331543
>> [proxy:0:0 at tab] we don't understand this command abort; forwarding upstream
>> [mpiexec at tab] [pgid: 0] got PMI command: cmd=abort exitcode=69331543
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list