[mpich-discuss] mpi hello-world error

Niyaz Murshed Niyaz.Murshed at arm.com
Tue Jun 18 09:18:34 CDT 2024


Awesome Hui.
I will give it a try today.
What value did you set env variable  PMI_HOSTNAME ?

From: Zhou, Hui <zhouh at anl.gov>
Date: Tuesday, June 18, 2024 at 8:56 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error
Niyaz,

For your information, I opened a pull requests to fix the interface selection issue -https://urldefense.us/v3/__https://github.com/pmodels/mpich/pull/7027__;!!G_uCfscf7eWS!ZltnDUlTdFgRwD8zBY-TKgwiyADLdPTm1372QXTAeaO4cOWhnBC2rH0fDlzzj-na37wAQfbsYw_H4ZdYJ54$ 

Hui
________________________________
From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 12:46 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error

Spoke too soon…. This time , it’s the other error I was seeing root@ dpr740: /mpich/examples# mpirun -n 5 -hosts 10. 118. 91. 158,10. 118. 91. 159 ./a. out ==== Various sizes and limits ==== sizeof(MPIDI_per_vci_t): 192 Required minimum FI_VERSION: 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Spoke too soon….

This  time , it’s the other error I  was seeing





root at dpr740:/mpich/examples# mpirun  -n 5 -hosts 10.118.91.158,10.118.91.159 ./a.out

==== Various sizes and limits ====

sizeof(MPIDI_per_vci_t): 192

Required minimum FI_VERSION: 0, current version: 10015

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1

provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://1672

provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://1672

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bd

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1353

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://1672

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://1672

Required minimum FI_VERSION: 10005, current version: 10015

==== Capability set configuration ====

libfabric provider: sockets - fe80::/64

MPIDI_OFI_ENABLE_DATA: 1

MPIDI_OFI_ENABLE_AV_TABLE: 1

MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1

MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0

MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0

MPIDI_OFI_ENABLE_MR_ALLOCATED: 0

MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1

MPIDI_OFI_ENABLE_MR_PROV_KEY: 0

MPIDI_OFI_ENABLE_TAGGED: 1

MPIDI_OFI_ENABLE_AM: 1

MPIDI_OFI_ENABLE_RMA: 1

MPIDI_OFI_ENABLE_ATOMICS: 1

MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1

MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1

MPIDI_OFI_ENABLE_TRIGGERED: 0

MPIDI_OFI_ENABLE_HMEM: 0

MPIDI_OFI_NUM_AM_BUFFERS: 8

MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0

MPIDI_OFI_CONTEXT_BITS: 20

MPIDI_OFI_SOURCE_BITS: 0

MPIDI_OFI_TAG_BITS: 31

MPIDI_OFI_VNI_USE_DOMAIN: 1

MAXIMUM SUPPORTED RANKS: 4294967296

MAXIMUM TAG: 2147483648

==== Provider global thresholds ====

max_buffered_send: 255

max_buffered_write: 255

max_msg_size: 9223372036854775807

max_order_raw: -1

max_order_war: -1

max_order_waw: -1

tx_iov_limit: 8

rx_iov_limit: 8

rma_iov_limit: 8

max_mr_key_size: 8

==== Various sizes and limits ====

MPIDI_OFI_AM_MSG_HEADER_SIZE: 24

MPIDI_OFI_MAX_AM_HDR_SIZE: 255

sizeof(MPIDI_OFI_am_request_header_t): 416

sizeof(MPIDI_OFI_per_vci_t): 52480

MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024

MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384

==== OFI dynamic settings ====

num_vcis: 1

num_nics: 1

======================================

Abort(680126095) on node 1: Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(70)................: MPI_Init(argc=(nil), argv=(nil)) failed

MPII_Init_thread(268)............:

MPIR_init_comm_world(34).........:

MPIR_Comm_commit(823)............:

MPID_Comm_commit_post_hook(222)..:

MPIDI_world_post_init(665).......:

MPIDI_OFI_init_vcis(851).........:

check_num_nics(900)..............:

MPIR_Allreduce_allcomm_auto(4726):

MPIC_Sendrecv(308)...............:

MPIC_Wait(90)....................:

MPIR_Wait(784)...................:

(unknown)(): Other MPI error

Abort(680125967) on node 0: Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(70)................: MPI_Init(argc=(nil), argv=(nil)) failed

MPII_Init_thread(268)............:

MPIR_init_comm_world(34).........:

MPIR_Comm_commit(823)............:

MPID_Comm_commit_post_hook(222)..:

MPIDI_world_post_init(665).......:

MPIDI_OFI_init_vcis(851).........:

check_num_nics(900)..............:

MPIR_Allreduce_allcomm_auto(4726):

MPIC_Recv(198)...................:

MPIC_Wait(90)....................:

MPIR_Wait(784)...................:

(unknown)(): Other MPI error

Abort(680650255) on node 3: Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(70)................: MPI_Init(argc=(nil), argv=(nil)) failed

MPII_Init_thread(268)............:

MPIR_init_comm_world(34).........:

MPIR_Comm_commit(823)............:

MPID_Comm_commit_post_hook(222)..:

MPIDI_world_post_init(665).......:

MPIDI_OFI_init_vcis(851).........:

check_num_nics(900)..............:

MPIR_Allreduce_allcomm_auto(4726):

MPIC_Sendrecv(301)...............:

MPID_Isend(63)...................:

MPIDI_isend(35)..................:

(unknown)(): Other MPI error





From: Zhou, Hui <zhouh at anl.gov>
Date: Monday, June 17, 2024 at 12:36 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error

It is picking the first interface from libfabric providers. It appears libfabric is "randomly" ordering the interfaces, which explains the inconsistencies.  The interface identified at mpirun is not passed down to the individual MPI processes, which should be passed down and be used to select the correct provider instance. We'll add that to our TODOs.



Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 12:06 PM
To: discuss at mpich.org <discuss at mpich.org>; Zhou, Hui <zhouh at anl.gov>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error



Okay, totally weird. I tried the iface , it did not work. But on the next try, it did … It did chose the correct interface now : libfabric provider: sockets - fe80: : /64 But any reason why its using ipv6 ? root@ dpr740: /mpich/examples# mpirun

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Okay, totally weird.

I tried the iface , it did not work. But on the next try, it did …

It did chose the correct interface now :



libfabric provider: sockets - fe80::/64



But any reason why its using ipv6 ?





root at dpr740:/mpich/examples# mpirun  -n 5 -hosts 10.118.91.158,10.118.91.159 ./a.out

==== Various sizes and limits ====

sizeof(MPIDI_per_vci_t): 192

Required minimum FI_VERSION: 0, current version: 10015

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1

provider: shm, score = 4, pref = -2, FI_ADDR_STR [13] - fi_shm://895

provider: shm, score = 4, pref = -2, FI_ADDR_STR [13] - fi_shm://895

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1270:fdff:fe18:58bc

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1352

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [12] - fi_sm2://895

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [12] - fi_sm2://895

Required minimum FI_VERSION: 10005, current version: 10015

==== Capability set configuration ====

libfabric provider: sockets - fe80::/64

MPIDI_OFI_ENABLE_DATA: 1

MPIDI_OFI_ENABLE_AV_TABLE: 1

MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1

MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0

MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0

MPIDI_OFI_ENABLE_MR_ALLOCATED: 0

MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1

MPIDI_OFI_ENABLE_MR_PROV_KEY: 0

MPIDI_OFI_ENABLE_TAGGED: 1

MPIDI_OFI_ENABLE_AM: 1

MPIDI_OFI_ENABLE_RMA: 1

MPIDI_OFI_ENABLE_ATOMICS: 1

MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1

MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1

MPIDI_OFI_ENABLE_TRIGGERED: 0

MPIDI_OFI_ENABLE_HMEM: 0

MPIDI_OFI_NUM_AM_BUFFERS: 8

MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0

MPIDI_OFI_CONTEXT_BITS: 20

MPIDI_OFI_SOURCE_BITS: 0

MPIDI_OFI_TAG_BITS: 31

MPIDI_OFI_VNI_USE_DOMAIN: 1

MAXIMUM SUPPORTED RANKS: 4294967296

MAXIMUM TAG: 2147483648

==== Provider global thresholds ====

max_buffered_send: 255

max_buffered_write: 255

max_msg_size: 9223372036854775807

max_order_raw: -1

max_order_war: -1

max_order_waw: -1

tx_iov_limit: 8

rx_iov_limit: 8

rma_iov_limit: 8

max_mr_key_size: 8

==== Various sizes and limits ====

MPIDI_OFI_AM_MSG_HEADER_SIZE: 24

MPIDI_OFI_MAX_AM_HDR_SIZE: 255

sizeof(MPIDI_OFI_am_request_header_t): 416

sizeof(MPIDI_OFI_per_vci_t): 52480

MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024

MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384

==== OFI dynamic settings ====

num_vcis: 1

num_nics: 1

======================================

error checking    : enabled

QMPI              : disabled

debugger support  : disabled

thread level      : MPI_THREAD_SINGLE

thread CS         : per-vci

threadcomm        : enabled

==== data structure summary ====

sizeof(MPIR_Comm): 1816

sizeof(MPIR_Request): 512

sizeof(MPIR_Datatype): 280

================================

Hello world from process 1 of 5 dpr740

Hello world from process 0 of 5 ampere-2-1

Hello world from process 2 of 5 ampere-2-1

Hello world from process 4 of 5 ampere-2-1

Hello world from process 3 of 5 dpr740













From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Monday, June 17, 2024 at 12:01 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: Re: [mpich-discuss] mpi hello-world error

I see there is an option called “iface” .. -iface network interface to use However, it did not help. From: Zhou, Hui <zhouh@ anl. gov> Date: Monday, June 17, 2024 at 11: 48 AM To: Niyaz Murshed <Niyaz. Murshed@ arm. com>, discuss@ mpich. org

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

I see there is an option called “iface” ..

   -iface                           network interface to use



However, it did not help.



From: Zhou, Hui <zhouh at anl.gov>
Date: Monday, June 17, 2024 at 11:48 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error

It is picking 192.168.1.1​ as local address rather than 10.118.91.158​. Try use 192.168.1.x for both hosts or remove the 192.168.1.x network. I don't think we have a way to select NIC interface.  We'll put that in plans.



--

Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 11:10 AM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error



Please find below : Interestingly , I don’t see verbs provider in the list. root@ dpr740: /mpich/examples# mpirun -n 2 -hosts 10. 118. 91. 158,10. 118. 91. 159 ./a. out ==== Various sizes and limits ==== sizeof(MPIDI_per_vci_t): 192 Required minimum

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Please find below :



Interestingly , I don’t see verbs provider in the list.





root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.158,10.118.91.159 ./a.out

==== Various sizes and limits ====

sizeof(MPIDI_per_vci_t): 192

Required minimum FI_VERSION: 0, current version: 10015

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1

provider: shm, score = 4, pref = -2, FI_ADDR_STR [13] - fi_shm://694

provider: shm, score = 4, pref = -2, FI_ADDR_STR [13] - fi_shm://694

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::42a6:b7ff:fe28:c008

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.158

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::f66b:8cff:fe55:657c

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [12] - fi_sm2://694

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [12] - fi_sm2://694

Required minimum FI_VERSION: 10005, current version: 10015

==== Capability set configuration ====

libfabric provider: sockets - 192.168.1.0/24

MPIDI_OFI_ENABLE_DATA: 1

MPIDI_OFI_ENABLE_AV_TABLE: 1

MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1

MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0

MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0

MPIDI_OFI_ENABLE_MR_ALLOCATED: 0

MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1

MPIDI_OFI_ENABLE_MR_PROV_KEY: 0

MPIDI_OFI_ENABLE_TAGGED: 1

MPIDI_OFI_ENABLE_AM: 1

MPIDI_OFI_ENABLE_RMA: 1

MPIDI_OFI_ENABLE_ATOMICS: 1

MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1

MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1

MPIDI_OFI_ENABLE_TRIGGERED: 0

MPIDI_OFI_ENABLE_HMEM: 0

MPIDI_OFI_NUM_AM_BUFFERS: 8

MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0

MPIDI_OFI_CONTEXT_BITS: 20

MPIDI_OFI_SOURCE_BITS: 0

MPIDI_OFI_TAG_BITS: 31

MPIDI_OFI_VNI_USE_DOMAIN: 1

MAXIMUM SUPPORTED RANKS: 4294967296

MAXIMUM TAG: 2147483648

==== Provider global thresholds ====

max_buffered_send: 255

max_buffered_write: 255

max_msg_size: 9223372036854775807

max_order_raw: -1

max_order_war: -1

max_order_waw: -1

tx_iov_limit: 8

rx_iov_limit: 8

rma_iov_limit: 8

max_mr_key_size: 8

==== Various sizes and limits ====

MPIDI_OFI_AM_MSG_HEADER_SIZE: 24

MPIDI_OFI_MAX_AM_HDR_SIZE: 255

sizeof(MPIDI_OFI_am_request_header_t): 416

sizeof(MPIDI_OFI_per_vci_t): 52480

MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024

MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

/opt/mpich/lib/libmpi.so.0(+0x59a0fc) [0xffffa661a0fc]

/opt/mpich/lib/libmpi.so.0(+0x4a6b58) [0xffffa6526b58]

/opt/mpich/lib/libmpi.so.0(+0x564740) [0xffffa65e4740]

/opt/mpich/lib/libmpi.so.0(+0x546c14) [0xffffa65c6c14]

/opt/mpich/lib/libmpi.so.0(+0x4f70cc) [0xffffa65770cc]

/opt/mpich/lib/libmpi.so.0(+0x4f9850) [0xffffa6579850]

/opt/mpich/lib/libmpi.so.0(+0x3ffd2c) [0xffffa647fd2c]

/opt/mpich/lib/libmpi.so.0(+0x4017ec) [0xffffa64817ec]

/opt/mpich/lib/libmpi.so.0(+0x3fe384) [0xffffa647e384]

/opt/mpich/lib/libmpi.so.0(+0x436a64) [0xffffa64b6a64]

/opt/mpich/lib/libmpi.so.0(+0x43700c) [0xffffa64b700c]

/opt/mpich/lib/libmpi.so.0(MPI_Init+0x44) [0xffffa61aeeb4]

./a.out(+0x9c4) [0xaaaac2a209c4]

/lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffffa5ef73fc]

/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffffa5ef74cc]

./a.out(+0x8b0) [0xaaaac2a208b0]

Abort(1) on node 0: Internal error









From: Zhou, Hui <zhouh at anl.gov>
Date: Monday, June 17, 2024 at 11:08 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error

Could you set env "MPIR_CVAR_DEBUG_SUMMARY=1` and rerun the test?



Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 11:05 AM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: mpi hello-world error



Yes, one of the hosts. I have 2 servers. Hostname1: dpr740/10. 118. 91. 159 Hostname2 : ampere-altra-2-1/10. 118. 91. 158 I am running the application on dpr740 Adding both hosts: root@ dpr740: /mpich/examples# mpirun -n 2 -hosts 10. 118. 91. 158,10. 118. 91. 159

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Yes, one of the hosts.

I have 2 servers.

Hostname1: dpr740/10.118.91.159

Hostname2 : ampere-altra-2-1/10.118.91.158



I am running the application on dpr740





Adding both hosts:



root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.158,10.118.91.159 ./a.out



Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

/opt/mpich/lib/libmpi.so.0(+0x59a0fc) [0xffffa063a0fc]

/opt/mpich/lib/libmpi.so.0(+0x4a6b58) [0xffffa0546b58]

/opt/mpich/lib/libmpi.so.0(+0x564740) [0xffffa0604740]

/opt/mpich/lib/libmpi.so.0(+0x546c14) [0xffffa05e6c14]

/opt/mpich/lib/libmpi.so.0(+0x4f70cc) [0xffffa05970cc]

/opt/mpich/lib/libmpi.so.0(+0x4f9850) [0xffffa0599850]

/opt/mpich/lib/libmpi.so.0(+0x3ffd2c) [0xffffa049fd2c]

/opt/mpich/lib/libmpi.so.0(+0x4017ec) [0xffffa04a17ec]

/opt/mpich/lib/libmpi.so.0(+0x3fe384) [0xffffa049e384]

/opt/mpich/lib/libmpi.so.0(+0x436a64) [0xffffa04d6a64]

/opt/mpich/lib/libmpi.so.0(+0x43700c) [0xffffa04d700c]

/opt/mpich/lib/libmpi.so.0(MPI_Init+0x44) [0xffffa01ceeb4]

./a.out(+0x9c4) [0xaaaacd3309c4]

/lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff9ff173fc]

/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffff9ff174cc]

./a.out(+0x8b0) [0xaaaacd3308b0]

Abort(1) on node 0: Internal error



[mpiexec at dpr740] HYDU_sock_write (lib/utils/sock.c:250): write error (Bad file descriptor)

[mpiexec at dpr740] send_hdr_downstream (mpiexec/pmiserv_cb.c:28): sock write error

^C[mpiexec at dpr740] Sending Ctrl-C to processes as requested

[mpiexec at dpr740] Press Ctrl-C again to force abort

[mpiexec at dpr740] HYDU_sock_write (lib/utils/sock.c:250): write error (Bad file descriptor)

[mpiexec at dpr740] send_hdr_downstream (mpiexec/pmiserv_cb.c:28): sock write error

[mpiexec at dpr740] HYD_pmcd_pmiserv_send_signal (mpiexec/pmiserv_cb.c:218): unable to write data to proxy

[mpiexec at dpr740] ui_cmd_cb (mpiexec/pmiserv_pmci.c:61): unable to send signal downstream

[mpiexec at dpr740] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status

[mpiexec at dpr740] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:173): error waiting for event

[mpiexec at dpr740] main (mpiexec/mpiexec.c:260): process manager error waiting for completion





If I just add the remote host, it will run successfully.



root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.158 ./a.out

Hello world from process 0 of 2

Hello world from process 1 of 2













From: Zhou, Hui <zhouh at anl.gov>
Date: Monday, June 17, 2024 at 10:33 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Subject: Re: mpi hello-world error

Alright. Let's focus on the case of two fixed nodes running



   mpirun  -n 2 -hosts 10.118.91.158,10.118.91.159 ./a.out

Is the error consistent every time?

Are you running the command from one of the host? Out of curiosity, why the host names looks like from two different naming systems?



--

Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 10:23 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Zhou, Hui <zhouh at anl.gov>
Subject: Re: mpi hello-world error



Hi Hui, Apologies for this, I just assumed more logs would give more information. Yes, both servers are on the same network. In the first email, I can run the hello-world application from server1 to server2 and vice versa. Its only when I add

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Hi Hui,

Apologies for this, I just assumed more logs would give more information.



Yes, both servers are on the same network.

In the first email, I can run the hello-world application from server1 to server2 and vice versa.



Its only when I add both servers in the parameters, the error is seen.



Get Outlook for iOS<https://urldefense.us/v3/__https:/aka.ms/o0ukef__;!!G_uCfscf7eWS!aA6_K_xqXWVqnjCozuSNlnNItijBkb7EDA_6xOPPs1AVXK3mV0yGRzhJT1WJ1N0oqX4ZZThlLzZPSpU2nxU$>

________________________________

From: Zhou, Hui via discuss <discuss at mpich.org>
Sent: Monday, June 17, 2024 9:41:50 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Zhou, Hui <zhouh at anl.gov>
Subject: Re: [mpich-discuss] mpi hello-world error



Niyaz,



I am quite lost on the errors you encountered. The three errors seem all over the place.  Are the two hosts on the same local network?



--

Hui

________________________________

From: Niyaz Murshed via discuss <discuss at mpich.org>
Sent: Monday, June 17, 2024 1:07 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>; nd <nd at arm.com>
Subject: Re: [mpich-discuss] mpi hello-world error



What is the best way to understand this log ? [proxy: 1@ ampere-altra-2-1] Sending upstream hdr. cmd = CMD_STDERR Abort(680650255) on node 1: Fatal error in internal_Init: Other MPI error, error stack: internal_Init(70). . . . . . . . . . . . . . . . : MPI_Init(argc=(nil),

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

What is the best way to understand this log ?





[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR

Abort(680650255) on node 1: Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(70)................: MPI_Init(argc=(nil), argv=(nil)) failed

MPII_Init_thread(268)............:

MPIR_init_comm_world(34).........:

MPIR_Comm_commit(823)............:

MPID_Comm_commit_post_hook(222)..:

MPIDI_world_post_init(665).......:

MPIDI_OFI_init_vcis(851).........:

check_num_nics(900)..............:

MPIR_Allreduce_allcomm_auto(4726):

MPIC_Sendrecv(301)...............:

MPID_Isend(63)...................:

MPIDI_isend(35)..................:

(unknown)(): Other MPI error

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_EXIT_STATUS



[proxy:0 at cesw-amp-gbt-2s-m12830-01] Sending upstream hdr.cmd = CMD_STDERR

Abort(680650255) on node 0: Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(70)................: MPI_Init(argc=(nil), argv=(nil)) failed

MPII_Init_thread(268)............:

MPIR_init_comm_world(34).........:

MPIR_Comm_commit(823)............:

MPID_Comm_commit_post_hook(222)..:

MPIDI_world_post_init(665).......:

MPIDI_OFI_init_vcis(851).........:

check_num_nics(900)..............:

MPIR_Allreduce_allcomm_auto(4726):

MPIC_Sendrecv(301)...............:

MPID_Isend(63)...................:

MPIDI_isend(35)..................:

(unknown)(): Other MPI error

[proxy:0 at cesw-amp-gbt-2s-m12830-01] Sending upstream hdr.cmd = CMD_EXIT_STATUS







From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Saturday, June 15, 2024 at 10:53 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: Re: [mpich-discuss] mpi hello-world error

Also seeing this error sometimes. root@ dpr740: /mpich/examples# export FI_PROVIDER=tcp root@ dpr740: /mpich/examples# mpirun -verbose -n 2 -hosts 10. 118. 91. 158,10. 118. 91. 159 ./a. out host: 10. 118. 91. 158 host: 10. 118. 91. 159 [mpiexec@ dpr740] Timeout

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Also seeing this error sometimes.





root at dpr740:/mpich/examples# export FI_PROVIDER=tcp

root at dpr740:/mpich/examples# mpirun  -verbose -n 2 -hosts 10.118.91.158,10.118.91.159 ./a.out

host: 10.118.91.158

host: 10.118.91.159

[mpiexec at dpr740] Timeout set to -1 (-1 means infinite)



==================================================================================================

mpiexec options:

----------------

  Base path: /opt/mpich/bin/

  Launcher: (null)

  Debug level: 1

  Enable X: -1



  Global environment:

  -------------------

    PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig

    HOSTNAME=dpr740

    HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233

    PWD=/mpich/examples

    HOME=/root

    FI_PROVIDER=tcp

    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:

    LESSCLOSE=/usr/bin/lesspipe %s %s

    TERM=xterm

    LESSOPEN=| /usr/bin/lesspipe %s

    SHLVL=1

    LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib

    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin

    OLDPWD=/

    _=/opt/mpich/bin/mpirun



  Hydra internal environment:

  ---------------------------

    GFORTRAN_UNBUFFERED_PRECONNECTED=y





    Proxy information:

    *********************

      [1] proxy: 10.118.91.158 (1 cores)

      Exec list: ./a.out (1 processes);



      [2] proxy: 10.118.91.159 (1 cores)

      Exec list: ./a.out (1 processes);





==================================================================================================





Proxy launch args: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:35625 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id



Arguments being passed to proxy 0:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.158 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1151_0_1450155337_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 15 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'FI_PROVIDER=tcp' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' 'OLDPWD=/' '_=/opt/mpich/bin/mpirun' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



Arguments being passed to proxy 1:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.159 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_1151_0_1450155337_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 15 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'FI_PROVIDER=tcp' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' 'OLDPWD=/' '_=/opt/mpich/bin/mpirun' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



[mpiexec at dpr740] Launch arguments: /usr/bin/ssh -x -p 2233 10.118.91.158 "/opt/mpich/bin/hydra_pmi_proxy" --control-port 10.118.91.159:35625 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 0

[mpiexec at dpr740] Launch arguments: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:35625 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 1

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:1 at dpr740] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:1 at dpr740] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:1 at dpr740] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:1 at dpr740] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_1151_0_1450155337_dpr740

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_process_mapping

[proxy:1 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_hwloc_xmlfile

[proxy:1 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_YPoAhr found=TRUE

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at dpr740] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=get_result rc=1

[proxy:1 at dpr740] we don't understand the response get_result; forwarding downstream

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_1151_0_1450155337_dpr740

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_process_mapping

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_hwloc_xmlfile

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_68iqm3 found=TRUE

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=get_result rc=1

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:0 at ampere-altra-2-1] we don't understand the response get_result; forwarding downstream

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at ampere-altra-2-1] [proxy:1 at dpr740] Sending PMI command:

    cmd=barrier_out

Sending upstream hdr.cmd = CMD_PMI

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-0 value=0200937DC0A80101[8]

[proxy:0 at ampere-altra-2-1] cached command: -allgather-shm-1-0=0200937DC0A80101[8]

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=put_result rc=0

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-0=0200937DC0A80101[8]



[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at ampere-altra-2-1] flushing 1 put command(s) out

[proxy:0 at ampere-altra-2-1] forwarding command upstream:

cmd=mput -allgather-shm-1-0=0200937DC0A80101[8]

[proxy:0 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-0=0200937DC0A80101[8]

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:0 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-1 value=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:1 at dpr740] cached command: -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:1 at dpr740] Sending PMI command:

    cmd=put_result rc=0

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at dpr740] flushing 1 put command(s) out

[proxy:1 at dpr740] forwarding command upstream:

cmd=mput -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:1 at dpr740] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]



[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=keyval_cache -allgather-shm-1-0=0200937DC0A80101[8] -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=keyval_cache -allgather-shm-1-0=0200937DC0A80101[8] -allgather-shm-1-1=0A00B381[4]FE80[6]526B4BFFFEFC134208[3]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:1 at dpr740] Sending PMI command:

    cmd=barrier_out

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-0

[proxy:1 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0200937DC0A80101[8] found=TRUE

[proxy:1 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-1

[proxy:1 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0A00B381[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_STDERR

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-0

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0200937DC0A80101[8] found=TRUE

[proxy:0 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_1151_0_1450155337_dpr740 key=-allgather-shm-1-1

[proxy:0 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0A00B381[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_STDERR

/opt/mpich/lib/libmpi.so.0(+0x58c005) [0x7f967920c005]

/opt/mpich/lib/libmpi.so.0(+0x491858) [0x7f9679111858]

/opt/mpich/lib/libmpi.so.0(+0x55428c) [0x7f96791d428c]

/opt/mpich/lib/libmpi.so.0(+0x53402d) [0x7f96791b402d]

/opt/mpich/lib/libmpi.so.0(+0x4dc71f) [0x7f967915c71f]

/opt/mpich/lib/libmpi.so.0(+0x4df09a) [0x7f967915f09a]

/opt/mpich/lib/libmpi.so.0(+0x3deab6) [0x7f967905eab6]

/opt/mpich/lib/libmpi.so.0(+0x3e0732) [0x7f9679060732]

/opt/mpich/lib/libmpi.so.0(+0x3dd075) [0x7f967905d075]

/opt/mpich/lib/libmpi.so.0(+0x418215) [0x7f9679098215]

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_STDERR

/opt/mpich/lib/libmpi.so.0(+0x4188fa) [0x7f96790988fa]

/opt/mpich/lib/libmpi.so.0(MPI_Init+0x34) [0x7f9678d57594]

./a.out(+0x121a) [0x55b07f1cc21a]

/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f9678a7cd90]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f9678a7ce40]

./a.out(+0x1125) [0x55b07f1cc125]

Abort(1) on node 1: Internal error

/opt/mpich/lib/libmpi.so.0(+0x59a0fc) [0xffff91d0a0fc]

/opt/mpich/lib/libmpi.so.0(+0x4a6b58) [0xffff91c16b58]

/opt/mpich/lib/libmpi.so.0(+0x564740) [0xffff91cd4740]

/opt/mpich/lib/libmpi.so.0(+0x546c14) [0xffff91cb6c14]

/opt/mpich/lib/libmpi.so.0(+0x4f70cc) [0xffff91c670cc]

/opt/mpich/lib/libmpi.so.0(+0x4f9850) [0xffff91c69850]

/opt/mpich/lib/libmpi.so.0(+0x3ffd2c) [0xffff91b6fd2c]

/opt/mpich/lib/libmpi.so.0(+0x4017ec) [0xffff91b717ec]

/opt/mpich/lib/libmpi.so.0(+0x3fe384) [0xffff91b6e384]

/opt/mpich/lib/libmpi.so.0(+0x436a64) [0xffff91ba6a64]

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR

/opt/mpich/lib/libmpi.so.0(+0x43700c) [0xffff91ba700c]

/opt/mpich/lib/libmpi.so.0(MPI_Init+0x44) [0xffff9189eeb4]

./a.out(+0x9c4) [0xaaaab5c709c4]

/lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff915e73fc]

/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffff915e74cc]

./a.out(+0x8b0) [0xaaaab5c708b0]

Abort(1) on node 0: Internal error

[proxy:1 at dpr740] Sending upstream hdr.cmd = CMD_EXIT_STATUS

[proxy:0 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_EXIT_STATUS





From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Saturday, June 15, 2024 at 12:10 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: [mpich-discuss] mpi hello-world error

Hello, I am trying to run the example hellow. c between 2 servers. I can run them individually and it works fine. 10. 118. 91. 158 is the machine I am running on. 10. 118. 91. 159 is the remote machine. root@ dpr740: /mpich/examples# mpirun -n 2 -hosts

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Hello,



I am trying to run the example hellow.c between 2 servers.

I can run them individually and it works fine.



10.118.91.158  is the machine I am running on.

10.118.91.159 is the remote machine.



root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.158  ./a.out

Hello world from process 0 of 2

Hello world from process 1 of 2



root at dpr740:/mpich/examples# mpirun  -n 2 -hosts 10.118.91.159  ./a.out

Hello world from process 1 of 2

Hello world from process 0 of 2



However, when I try to run them on both, I get the below error.

realloc(): invalid pointer



Is this a known issue ? Any suggestions?





root at dpr740:/mpich/examples# mpirun -verbose  -n 2 -hosts 10.118.91.159,10.118.91.158  ./a.out

host: 10.118.91.159

host: 10.118.91.158

[mpiexec at dpr740] Timeout set to -1 (-1 means infinite)



==================================================================================================

mpiexec options:

----------------

  Base path: /opt/mpich/bin/

  Launcher: (null)

  Debug level: 1

  Enable X: -1



  Global environment:

  -------------------

    PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig

    HOSTNAME=dpr740

    HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233

    PWD=/mpich/examples

    HOME=/root

    LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:

    LESSCLOSE=/usr/bin/lesspipe %s %s

    TERM=xterm

    LESSOPEN=| /usr/bin/lesspipe %s

    SHLVL=1

    LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib

    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin

    _=/opt/mpich/bin/mpirun

    OLDPWD=/



  Hydra internal environment:

  ---------------------------

    GFORTRAN_UNBUFFERED_PRECONNECTED=y





    Proxy information:

    *********************

      [1] proxy: 10.118.91.159 (1 cores)

      Exec list: ./a.out (1 processes);



      [2] proxy: 10.118.91.158 (1 cores)

      Exec list: ./a.out (1 processes);





==================================================================================================





Proxy launch args: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id



Arguments being passed to proxy 0:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.159 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_844_0_801938186_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 14 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' '_=/opt/mpich/bin/mpirun' 'OLDPWD=/' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



Arguments being passed to proxy 1:

--version 4.3.0a1 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname 10.118.91.158 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_844_0_801938186_dpr740 --pmi-process-mapping (vector,(0,2,1)) --global-inherited-env 14 'PKG_CONFIG_PATH=:/opt/libfabric/lib/pkgconfig:/opt/mpich/lib/pkgconfig' 'HOSTNAME=dpr740' 'HYDRA_LAUNCHER_EXTRA_ARGS=-p 2233' 'PWD=/mpich/examples' 'HOME=/root' 'LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:' 'LESSCLOSE=/usr/bin/lesspipe %s %s' 'TERM=xterm' 'LESSOPEN=| /usr/bin/lesspipe %s' 'SHLVL=1' 'LD_LIBRARY_PATH=:/opt/libfabric/lib:/opt/fabtests/lib:/opt/mpich/lib' 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/libfabric/bin/:/opt/fabtest/bin:/opt/mpich/bin' '_=/opt/mpich/bin/mpirun' 'OLDPWD=/' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /mpich/examples --exec-args 1 ./a.out



[mpiexec at dpr740] Launch arguments: /opt/mpich/bin/hydra_pmi_proxy --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 0

[mpiexec at dpr740] Launch arguments: /usr/bin/ssh -x -p 2233 10.118.91.158 "/opt/mpich/bin/hydra_pmi_proxy" --control-port 10.118.91.159:33909 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --pmi-port 0 --gpus-per-proc -2 --gpu-subdevs-per-proc -2 --proxy-id 1

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:0 at dpr740] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:0 at dpr740] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:0 at dpr740] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:0 at dpr740] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_844_0_801938186_dpr740

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_process_mapping

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_hwloc_xmlfile

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_CeNRJN found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=get_result rc=1

[proxy:0 at dpr740] we don't understand the response get_result; forwarding downstream

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PID_LIST

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=init pmi_version=1 pmi_subversion=1

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=response_to_init rc=0 pmi_version=1 pmi_subversion=1

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_maxes

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=maxes rc=0 kvsname_max=256 keylen_max=64 vallen_max=1024

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_appnum

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=appnum rc=0 appnum=0

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get_my_kvsname

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=my_kvsname rc=0 kvsname=kvs_844_0_801938186_dpr740

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_process_mapping

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=(vector,(0,2,1)) found=TRUE

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_hwloc_xmlfile

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=/tmp/hydra_hwloc_xmlfile_xv8EIG found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds



[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=get_result rc=1

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=PMI_mpi_memory_alloc_kinds

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] we don't understand the response get_result; forwarding downstream

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:0 at dpr740] Sending PMI command:

    cmd=barrier_out

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] [proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=put kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1 value=0200A8BFC0A80101[8]

cached command: -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending PMI command:

    cmd=put_result rc=0

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:0 at dpr740] flushing 1 put command(s) out

[proxy:0 at dpr740] forwarding command upstream:

cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] cached command: -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending PMI command:

[proxy:0 at dpr740] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:0 at dpr740] Sending upstream hdr.cmd = CMD_PMI

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3]



    cmd=put_result rc=0

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]



[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] flushing 1 put command(s) out

[proxy:1 at ampere-altra-2-1] forwarding command upstream:

[mpiexec at dpr740] [pgid: 0] got PMI command: cmd=barrier_in



[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=keyval_cache -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] -allgather-shm-1-1=0200A8BFC0A80101[8]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=keyval_cache -allgather-shm-1-0=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] -allgather-shm-1-1=0200A8BFC0A80101[8]

[mpiexec at dpr740] Sending internal PMI command (proxy:0:0):

    cmd=barrier_out

[mpiexec at dpr740] Sending internal PMI command (proxy:0:1):

    cmd=barrier_out

[proxy:0 at dpr740] Sending PMI command:

    cmd=barrier_out

cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=mput -allgather-shm-1-1=0200A8BFC0A80101[8]

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:1 at ampere-altra-2-1] Sending upstream internal PMI command:

    cmd=barrier_in

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_PMI

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

[proxy:0 at dpr740] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1

[proxy:0 at dpr740] Sending PMI command:

    cmd=get_result rc=0 value=0200A8BFC0A80101[8] found=TRUE

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=barrier_out

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-0

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0A00812D[4]FE80[6]526B4BFFFEFC134208[3] found=TRUE

[proxy:1 at ampere-altra-2-1] got pmi command from downstream 0-0:

    cmd=get kvsname=kvs_844_0_801938186_dpr740 key=-allgather-shm-1-1

[proxy:1 at ampere-altra-2-1] Sending PMI command:

    cmd=get_result rc=0 value=0200A8BFC0A80101[8] found=TRUE

realloc(): invalid pointer

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_STDERR



===================================================================================

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

=   PID 2404 RUNNING AT 10.118.91.158

=   EXIT CODE: 134

=   CLEANING UP REMAINING PROCESSES

=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================

[proxy:1 at ampere-altra-2-1] Sending upstream hdr.cmd = CMD_EXIT_STATUS

[proxy:0 at dpr740] HYD_pmcd_pmip_control_cmd_cb (proxy/pmip_cb.c:484): assert (!closed) failed

[proxy:0 at dpr740] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status

[proxy:0 at dpr740] main (proxy/pmip.c:122): demux engine error waiting for event

[mpiexec at dpr740] HYDT_bscu_wait_for_completion (lib/tools/bootstrap/utils/bscu_wait.c:109): one of the processes terminated badly; aborting

[mpiexec at dpr740] HYDT_bsci_wait_for_completion (lib/tools/bootstrap/src/bsci_wait.c:21): launcher returned error waiting for completion

[mpiexec at dpr740] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:189): launcher returned error waiting for completion

[mpiexec at dpr740] main (mpiexec/mpiexec.c:260): process manager error waiting for completion







Regards,

Niyaz

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240618/47fdc145/attachment-0001.html>


More information about the discuss mailing list