[mpich-discuss] Running mpich3 on multiple machines

Doha Ehab dohaehab at gmail.com
Wed Jan 13 13:04:10 CST 2016


Hello,
I have crossed compiled mpich3.2 and it runs perfectly on a single machine
but when I try it on 2 machines it hangs. (the ssh is working without
mpich).here is the command I'm using . any suggestions
mpiexec.hydra -verbose --host sams,samplus -launcher ssh -n 2 -l hostname
host: sams
host: samplus

==================================================================================================
mpiexec options:
----------------
  Base path: /system/xbin/
  Launcher: ssh
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    ANDROID_ROOT=/system
    LD_LIBRARY_PATH=/vendor/lib:/system/lib
    PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin
    LOOP_MOUNTPOINT=/mnt/obb
    ASEC_MOUNTPOINT=/mnt/asec
    EXTERNAL_STORAGE2=/mnt/sdcard/external_sd

BOOTCLASSPATH=/system/framework/core.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/core-junit.jar
    ANDROID_BOOTLOGO=1
    ANDROID_ASSETS=/system/app
    EXTERNAL_STORAGE=/mnt/sdcard
    ANDROID_DATA=/data
    USBHOST_STORAGE=/mnt/sdcard/usbStorage
    ANDROID_PROPERTY_WORKSPACE=9,65536

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: sams (1 cores)
      Exec list: hostname (1 processes);

      [2] proxy: samplus (1 cores)
      Exec list: hostname (1 processes);


==================================================================================================

[mpiexec at samplus] Timeout set to -1 (-1 means infinite)
[mpiexec at samplus] Got a control port string of samplus:36339

Proxy launch args: /system/xbin/hydra_pmi_proxy --control-port
samplus:36339 --debug --rmk user --launcher ssh --demux poll --pgid 0
--retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
--hostname sams --global-core-map 0,1,2 --pmi-id-map 0,0
--global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_3105_0
--pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
--global-inherited-env 13 'ANDROID_ROOT=/system'
'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
'LOOP_MOUNTPOINT=/mnt/obb' 'ASEC_MOUNTPOINT=/mnt/asec'
'EXTERNAL_STORAGE2=/mnt/sdcard/external_sd'
'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/core-junit.jar'
'ANDROID_BOOTLOGO=1' 'ANDROID_ASSETS=/system/app'
'EXTERNAL_STORAGE=/mnt/sdcard' 'ANDROID_DATA=/data'
'USBHOST_STORAGE=/mnt/sdcard/usbStorage'
'ANDROID_PROPERTY_WORKSPACE=9,65536' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir / --exec-args 1 hostname

Arguments being passed to proxy 1:
--version 3.2 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
--hostname samplus --global-core-map 0,1,2 --pmi-id-map 0,1
--global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_3105_0
--pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
--global-inherited-env 13 'ANDROID_ROOT=/system'
'LD_LIBRARY_PATH=/vendor/lib:/system/lib'
'PATH=/sbin:/vendor/bin:/system/sbin:/system/bin:/system/xbin'
'LOOP_MOUNTPOINT=/mnt/obb' 'ASEC_MOUNTPOINT=/mnt/asec'
'EXTERNAL_STORAGE2=/mnt/sdcard/external_sd'
'BOOTCLASSPATH=/system/framework/core.jar:/system/framework/bouncycastle.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/core-junit.jar'
'ANDROID_BOOTLOGO=1' 'ANDROID_ASSETS=/system/app'
'EXTERNAL_STORAGE=/mnt/sdcard' 'ANDROID_DATA=/data'
'USBHOST_STORAGE=/mnt/sdcard/usbStorage'
'ANDROID_PROPERTY_WORKSPACE=9,65536' --global-user-env 0
--global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y'
--proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1
--exec-local-env 0 --exec-wdir / --exec-args 1 hostname

[mpiexec at samplus] Launch arguments: /system/xbin/ssh -x sams
"/system/xbin/hydra_pmi_proxy" --control-port samplus:36339 --debug --rmk
user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2
--proxy-id 0
[mpiexec at samplus] Launch arguments: /system/xbin/hydra_pmi_proxy
--control-port samplus:36339 --debug --rmk user --launcher ssh --demux poll
--pgid 0 --retries 10 --usize -2 --proxy-id 1
WARNING: Ignoring unknown argument '-x'
[1] samplus

Host 'sams' is not in the trusted hosts file.
(fingerprint md5 1c:c2:ee:33:9c:80:ae:9d:2c:b7:9f:7e:64:c2:87:e2)
Do you want to continue connecting? (y/n) y
^C[mpiexec at samplus] Sending Ctrl-C to processes as requested
[mpiexec at samplus] Press Ctrl-C again to force abort
[mpiexec at samplus] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad
file descriptor)
[mpiexec at samplus] HYD_pmcd_pmiserv_send_signal
(pm/pmiserv/pmiserv_cb.c:169): unable to write data to proxy
[mpiexec at samplus] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send
signal downstream
[mpiexec at samplus] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at samplus] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at samplus] main (ui/mpich/mpiexec.c:344): process manager error
waiting for completion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160113/9ba60b56/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list