[mpich-discuss] mpi hello-world error

Niyaz Murshed Niyaz.Murshed at arm.com
Sat Jun 15 00:08:44 CDT 2024


Hello,

I am trying to run the example hellow.c between 2 servers.
I can run them individually and it works fine.

10.118.91.158  is the machine I am running on.
10.118.91.159 is the remote machine.


root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.158  ./a.out

Hello world from process 0 of 2

Hello world from process 1 of 2



root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.159  ./a.out

Hello world from process 1 of 2

Hello world from process 0 of 2



However, when I try to run them on both, I get the below error.

realloc(): invalid pointer



Is this a known issue ? Any suggestions?





root at dpr740:/mpich/examples# mpirun -verbose  -n 2 -hosts 10.118.91.159,10.118.91.158  ./a.out

host: 10.118.91.159

host: 10.118.91.158

[mpiexec at dpr740] Timeout set to -1 (-1 means infinite)



==================================================================================================

mpiexec options:

----------------

  Base path: /opt/mpich/bin/

  Launcher: (null)

  Debug level: 1

  Enable X: -1



  Global environment:

  -------------------

    PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig

    HOSTNAME=dpr740

    HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233

    PWD=/mpich/examples

    HOME=/root

    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:

    LESSCLOSE=/usr/bin/lesspipe %s %s

    TERM=xterm

    LESSOPEN=| /usr/bin/lesspipe %s

    SHLVL=1

    LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib

    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin

    _=/opt/mpich/bin/mpirun

    OLDPWD=/



  Hydra internal environment:

  ---------------------------

    GFORTRAN_UNBUFFERED_PRECONNECTED=y





    Proxy information:

    *********************

      [1] proxy: 10.118.91.159 (1 cores)

      Exec list: ./a.out (1 processes);



      [2] proxy: 10.118.91.158 (1 cores)

      Exec list: ./a.out (1 processes);





==================================================================================================





Proxy launch args: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id



Arguments being passed to proxy 0:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.159 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_844_0_801938186_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 14 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' '_=/opt/mpich/bin/mpirun' 'OLDPWD=/' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



Arguments being passed to proxy 1:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.158 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_844_0_801938186_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 14 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' '_=/opt/mpich/bin/mpirun' 'OLDPWD=/' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



[mpiexec at dpr740] Launch arguments: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 0

[mpiexec at dpr740] Launch arguments: /usr/bin/ssh -x -p 2233 10.118.91.158 "/opt/mpich/bin/hydra_pmi_proxy" --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 1

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:0 at dpr740] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:0 at dpr740] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:0 at dpr740] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:0 at dpr740] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_844_0_801938186_dpr740

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_process_mapping

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_hwloc_xmlfile

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_CeNRJN found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=get_result rc=1

[proxy:0 at dpr740] we don't understand the response get_result; forwarding downstream

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_844_0_801938186_dpr740

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_process_mapping

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_hwloc_xmlfile

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_xv8EIG found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=get_result rc=1

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] we don't understand the response get_result; forwarding downstream

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:0 at dpr740] Sending PMI command:

    cmd=barrier_out

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] [proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1 value=0200A8BFC0A80101[8]

cached command: -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending PMI command:

    cmd=put_result rc=0

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at dpr740] flushing 1 put command(s) out

[proxy:0 at dpr740] forwarding command upstream:

cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] cached command: -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending PMI command:

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]



    cmd=put_result rc=0

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]



[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] flushing 1 put command(s) out

[proxy:1 at ampere-altra-2-1] forwarding command upstream:

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=keyval_cache -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] -allgather-shm-1-1=0200A8BFC0A80101[8]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=keyval_cache -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] -allgather-shm-1-1=0200A8BFC0A80101[8]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:0 at dpr740] Sending PMI command:

    cmd=barrier_out

cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0200A8BFC0A80101[8] found=TRUE

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0200A8BFC0A80101[8] found=TRUE

realloc(): invalid pointer

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR



===================================================================================

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

=   PID 2404 RUNNING AT 10.118.91.158

=   EXIT CODE: 134

=   CLEANING UP REMAINING PROCESSES

=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_EXIT_STATUS

[proxy:0 at dpr740] HYD_pmcd_pmip_control_cmd_cb (proxy/pmip_cb.c:484): assert (!closed) failed

[proxy:0 at dpr740] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status

[proxy:0 at dpr740] main (proxy/pmip.c:122): demux engine error waiting for event

[mpiexec at dpr740] HYDT_bscu_wait_for_completion (lib/tools/bootstrap/utils/bscu_wait.c:109): one of the processes terminated badly; aborting

[mpiexec at dpr740] HYDT_bsci_wait_for_completion (lib/tools/bootstrap/src/bsci_wait.c:21): launcher returned error waiting for completion

[mpiexec at dpr740] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:189): launcher returned error waiting for completion

[mpiexec at dpr740] main (mpiexec/mpiexec.c:260): process manager error waiting for completion




Regards,
Niyaz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240615/c20559b6/attachment-0001.html>


More information about the discuss mailing list