[mpich-discuss] Error while using MPI_Send
Kenneth Raffenetti
raffenet at mcs.anl.gov
Mon Jul 25 13:36:47 CDT 2016
Hi,
Normally this type of error would be the result of a firewall blocking
communication, but your run is on a single node so that shouldn't be the
case. I wonder, since it did work at one point, if there is some bad
state on your system that might be cleared with a reboot?
It also looks like you are running on a Android system, which we do not
have much experience with, and no good way to test ourselves so our ways
to help may be limited.
Ken
On 07/24/2016 11:50 AM, Doha Ehab wrote:
> Hello
> I am using a cross compiled version of MPICH3, I was trying simple
> containing MPI_Send and MPI_Recv, it was working but suddenly I keep
> receiving this error messsage: can anyone point out what is wrong and
> how to fix it.
>
> $ mpiexec -v -n 4 /data/parallelCode
> host: tab
>
> ==================================================================================================
> mpiexec options:
> ----------------
> Base path: /system/xbin/
> Launcher: (null)
> Debug level: 1
> Enable X: -1
>
> Global environment:
> -------------------
> _=/system/xbin/mpiexec
> PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
> LOOP_MOUNTPOINT=/mnt/obb
> ANDROID_ROOT=/system
> SHELL=/system/bin/sh
> ANDROID_DATA=/data
> ANDROID_ASSETS=/system/app
> TERM=vt100
> ANDROID_PROPERTY_WORKSPACE=8,0
> ANDROID_BOOTLOGO=1
> HOSTNAME=hwt1701
> LD_LIBRARY_PATH=/vendor/lib:/system/lib
>
> BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar
> EMULATED_STORAGE_SOURCE=/mnt/shell/emulated
> ANDROID_SOCKET_adbd=10
> EMULATED_STORAGE_TARGET=/storage/emulated
> ANDROID_STORAGE=/storage
> MKSH=/system/bin/sh
> EXTERNAL_STORAGE=/storage/emulated/legacy
> USBHOST_STORAGE=/storage/usbdisk
> RANDOM=11338
> ASEC_MOUNTPOINT=/mnt/asec
> SECONDARY_STORAGE=/storage/sdcard1
> USER=shell
> LEGACY_STORAGE=/storage/emulated/legacy
> HOME=/data
>
> Hydra internal environment:
> ---------------------------
> GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
> Proxy information:
> *********************
> [1] proxy: tab (1 cores)
> Exec list: /data/mmp100 (4 processes);
>
>
> ==================================================================================================
>
> [mpiexec at tab] Timeout set to -1 (-1 means infinite)
> [mpiexec at tab] Got a control port string of tab:48661
>
> Proxy launch args: /system/xbin/hydra_pmi_proxy --control-port tab:48661
> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> --usize -2 --proxy-id
>
> Arguments being passed to proxy 0:
> --version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname tab --global-core-map 0,1,1 --pmi-id-map 0,0
> --global-process-count 4 --auto-cleanup 1 --pmi-kvsname kvs_10003_0
> --pmi-process-mapping (vector,(0,1,1)) --ckpoint-num -1
> --global-inherited-env 26 '_=/system/xbin/mpiexec'
> 'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
> 'LOOP_MOUNTPOINT=/mnt/obb' 'ANDROID_ROOT=/system' 'SHELL=/system/bin/sh'
> 'ANDROID_DATA=/data' 'ANDROID_ASSETS=/system/app' 'TERM=vt100'
> 'ANDROID_PROPERTY_WORKSPACE=8,0' 'ANDROID_BOOTLOGO=1' 'HOSTNAME=hwt1701'
> 'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
> 'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/conscrypt.jar:/system/framework/okhttp.jar:/system/framework/core-junit.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/framework2.jar:/system/framework/hwframework.jar:/system/framework/hwcustframework.jar:/system/framework/telephony-common.jar:/system/framework/voip-common.jar:/system/framework/mms-common.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/apache-xml.jar:/system/framework/webviewchromium.jar:/system/framework/hwEmui.jar:/system/framework/hwServices.jar:/system/framework/hwAndroid.policy.jar:/system/framework/hwTelephony-common.jar:/system/framework/hwpadext.jar'
> 'EMULATED_STORAGE_SOURCE=/mnt/shell/emulated' 'ANDROID_SOCKET_adbd=10'
> 'EMULATED_STORAGE_TARGET=/storage/emulated' 'ANDROID_STORAGE=/storage'
> 'MKSH=/system/bin/sh' 'EXTERNAL_STORAGE=/storage/emulated/legacy'
> 'USBHOST_STORAGE=/storage/usbdisk' 'RANDOM=11338'
> 'ASEC_MOUNTPOINT=/mnt/asec' 'SECONDARY_STORAGE=/storage/sdcard1'
> 'USER=shell' 'LEGACY_STORAGE=/storage/emulated/legacy' 'HOME=/data'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 4 --exec-local-env 0 --exec-wdir /
> --exec-args 1 /data/mmp100
>
> [mpiexec at tab] Launch arguments: /system/xbin/hydra_pmi_proxy
> --control-port tab:48661 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 0
> [proxy:0:0 at tab] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at tab] got pmi command (from 0): get_maxes
>
> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at tab] got pmi command (from 6): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at tab] got pmi command (from 6): get_maxes
>
> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at tab] got pmi command (from 9): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at tab] got pmi command (from 15): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at tab] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at tab] got pmi command (from 0): get_appnum
>
> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at tab] got pmi command (from 9): get_maxes
>
> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 15): get_maxes
>
> [proxy:0:0 at tab] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at tab] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 9): get_appnum
>
> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at tab] got pmi command (from 0): get
> kvsname=kvs_10003_0 key=PMI_process_mapping
> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,1,1))
> [proxy:0:0 at tab] got pmi command (from 15): get_appnum
>
> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at tab] got pmi command (from 6): get_appnum
>
> [proxy:0:0 at tab] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 6): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 9): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 0): put
> kvsname=kvs_10003_0 key=P0-businesscard
> value=port#49751$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] cached command:
> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at tab] got pmi command (from 9): get
> kvsname=kvs_10003_0 key=PMI_process_mapping
> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,1,1))
> [proxy:0:0 at tab] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at tab] got pmi command (from 6): get
> kvsname=kvs_10003_0 key=PMI_process_mapping
> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,1,1))
> [proxy:0:0 at tab] got pmi command (from 15): get_my_kvsname
>
> [proxy:0:0 at tab] PMI response: cmd=my_kvsname kvsname=kvs_10003_0
> [proxy:0:0 at tab] got pmi command (from 9): put
> kvsname=kvs_10003_0 key=P2-businesscard
> value=port#60729$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] cached command:
> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at tab] got pmi command (from 15): get
> kvsname=kvs_10003_0 key=PMI_process_mapping
> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,1,1))
> [proxy:0:0 at tab] got pmi command (from 9): barrier_in
>
> [proxy:0:0 at tab] got pmi command (from 6): put
> kvsname=kvs_10003_0 key=P1-businesscard
> value=port#44344$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] cached command:
> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at tab] got pmi command (from 15): put
> kvsname=kvs_10003_0 key=P3-businesscard
> value=port#51326$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] cached command:
> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at tab] got pmi command (from 6): barrier_in
>
> [proxy:0:0 at tab] got pmi command (from 15): barrier_in
>
> [proxy:0:0 at tab] flushing 4 put command(s) out
> [mpiexec at tab] [pgid: 0] got PMI command: cmd=put
> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> [proxy:0:0 at tab] forwarding command (cmd=put
> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$) upstream
> [proxy:0:0 at tab] forwarding command (cmd=barrier_in) upstream
> [mpiexec at tab] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=keyval_cache
> P0-businesscard=port#49751$description#tab$ifname#192.168.1.4$
> P2-businesscard=port#60729$description#tab$ifname#192.168.1.4$
> P1-businesscard=port#44344$description#tab$ifname#192.168.1.4$
> P3-businesscard=port#51326$description#tab$ifname#192.168.1.4$
> [mpiexec at tab] PMI response to fd 6 pid 15: cmd=barrier_out
> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> [proxy:0:0 at tab] PMI response: cmd=barrier_out
> [proxy:0:0 at tab] got pmi command (from 0): get
> kvsname=kvs_10003_0 key=P1-businesscard
> [proxy:0:0 at tab] PMI response: cmd=get_result rc=0 msg=success
> value=port#44344$description#tab$ifname#192.168.1.4$
> Fatal error in MPI_Send: Unknown error class, error stack:
> MPI_Send(174)...............................: MPI_Send(buf=0x15c56c,
> count=1, MPI_INT, dest=1, tag=1, MPI_COMM_WORLD) failed
> MPIDI_CH3i_Progress_wait(242)...............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(697)..:
> MPIDI_CH3_Sockconn_handle_connect_event(597): [ch3:sock] failed to
> connnect to remote process
> MPIDU_Socki_handle_connect(808).............: connection failure
> (set=0,sock=1,errno=113:No route to host)
> [proxy:0:0 at tab] got pmi command (from 0): abort
> exitcode=69331543
> [proxy:0:0 at tab] we don't understand this command abort; forwarding upstream
> [mpiexec at tab] [pgid: 0] got PMI command: cmd=abort exitcode=69331543
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list