[mpich-discuss] Mpich over RDMA sample

Niyaz Murshed Niyaz.Murshed at arm.com
Tue Jun 18 11:10:51 CDT 2024


Okay, let me keep digging.
If verbs;ofi_rxm is good, then my understanding is that I should see the RoCE messages over the wire, correct?
i.e once I can get the sample application working.
Currently, we still see the error as below. I will try to use your patch for interface selection to see what happens.


==== Capability set configuration ====

libfabric provider: verbs;ofi_rxm - IB-0xfe80000000000000


Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL


From: Zhou, Hui <zhouh at anl.gov>
Date: Tuesday, June 18, 2024 at 11:05 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Niyaz,

FYI -https://urldefense.us/v3/__https://ofiwg.github.io/libfabric/v1.21.0/man/fi_provider.7.html__;!!G_uCfscf7eWS!cn5cXL4QAe2E8ZgxXqVCAl7c3CheHJj61w7oq7Xjax_lCc5i0vxNUMPTygWv434dybYxB1kttyfXaQHq74c$ 


Hui
________________________________
From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Tuesday, June 18, 2024 11:00 AM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Hi Hui, When you say message semantic, do you mean endpoint type message ? Mpich is running over libfabric. When I run sample application on libfabric, it does have message endpoints over RMA. From: Zhou, Hui <zhouh@ anl. gov> Date: Tuesday,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi Hui,

When you say message semantic, do you mean endpoint type message ?



Mpich  is running over libfabric.

When I run sample application on libfabric, it does have message endpoints over RMA.





From: Zhou, Hui <zhouh at anl.gov>
Date: Tuesday, June 18, 2024 at 9:44 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

The "verbs"​ provider alone does not provide the message semantics we needed for MPI. "ofi_rxm" is the utility provider that provides the message semantics. "verbs;ofi_rxm" is the provider combination you want.



Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Tuesday, June 18, 2024 9:14 AM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample



Hi Hui, I have set the env variable to FI_PROVIDER=verbs , after thich I get the below. libfabric provider: verbs;ofi_rxm - IB-0xfe80000000000000 It does not chose just “verbs” , it selects verbs;ofi_rxf I am wanting to select the below: provider: 

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Hi Hui,



I have set the env variable to FI_PROVIDER=verbs , after thich I get the below.

libfabric provider: verbs;ofi_rxm - IB-0xfe80000000000000

It does not chose just “verbs” , it selects verbs;ofi_rxf



I am wanting to select the below:

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200





I would really like to thank you for such fast responses.



From: Zhou, Hui <zhouh at anl.gov>
Date: Tuesday, June 18, 2024 at 8:58 AM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Niyaz,



Set environment variable FI_PROVIDER=verbs​ to select the verbs provider.



Hui

________________________________

From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Sent: Monday, June 17, 2024 8:57 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample



Hi Hui, I would still need some assistance in this. I am not able to make mpich chose verbs provider ☹ With the log : provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192. 168. 2. 200 Is my assumption to think the above log represent

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Hi Hui,



I would still need some assistance in this. I am not able to make mpich chose verbs provider ☹

With the log :

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200



Is my assumption to think the above log represent that verbs provider knows about a path to 192.168.2.200 ?

How does the application hello-world decide which provider to use?



Running on server: 192.168.2.100

root at ampere-altra-2-1# FI_PROVIDER=verbs MPIR_CVAR_DEBUG_SUMMARY=1 mpirun  -n 5 -hosts 192.168.2.200,192.168.2.100 /mpich/examples/a.out



==== Various sizes and limits ====

sizeof(MPIDI_per_vci_t): 192

Required minimum FI_VERSION: 0, current version: 10015

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

Required minimum FI_VERSION: 10006, current version: 10015

==== Capability set configuration ====

libfabric provider: verbs;ofi_rxm - IB-0xfe80000000000000

MPIDI_OFI_ENABLE_DATA: 0

MPIDI_OFI_ENABLE_AV_TABLE: 1

MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 0

MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0

MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 1

MPIDI_OFI_ENABLE_MR_ALLOCATED: 1

MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1

MPIDI_OFI_ENABLE_MR_PROV_KEY: 1

MPIDI_OFI_ENABLE_TAGGED: 1

MPIDI_OFI_ENABLE_AM: 1

MPIDI_OFI_ENABLE_RMA: 1

MPIDI_OFI_ENABLE_ATOMICS: 0

MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1

MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1

MPIDI_OFI_ENABLE_TRIGGERED: 0

MPIDI_OFI_ENABLE_HMEM: 0

MPIDI_OFI_NUM_AM_BUFFERS: 8

MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0

MPIDI_OFI_CONTEXT_BITS: 16

MPIDI_OFI_SOURCE_BITS: 23

MPIDI_OFI_TAG_BITS: 20

MPIDI_OFI_VNI_USE_DOMAIN: 1

MAXIMUM SUPPORTED RANKS: 8388608

MAXIMUM TAG: 1048576

==== Provider global thresholds ====

max_buffered_send: 192

max_buffered_write: 192

max_msg_size: 1073741824

max_order_raw: 1073741824

max_order_war: 0

max_order_waw: 1073741824

tx_iov_limit: 4

rx_iov_limit: 4

rma_iov_limit: 1

max_mr_key_size: 4

==== Various sizes and limits ====

MPIDI_OFI_AM_MSG_HEADER_SIZE: 24

MPIDI_OFI_MAX_AM_HDR_SIZE: 255

sizeof(MPIDI_OFI_am_request_header_t): 416

sizeof(MPIDI_OFI_per_vci_t): 52480

MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024

MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL

Assertion failed in file src/mpid/ch4/netmod/ofi/init_addrxchg.c at line 151: mapped_table[i] != FI_ADDR_NOTAVAIL







If I do not use the FI_PROVIDER=verbs, it will work over sockets.



root at ampere-altra-2-1: # MPIR_CVAR_DEBUG_SUMMARY=1 mpirun  -n 5 -hosts 192.168.2.200,192.168.2.100 /mpich/examples/a.out



==== Various sizes and limits ====

sizeof(MPIDI_per_vci_t): 192

Required minimum FI_VERSION: 0, current version: 10015

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs, score = 0, pref = 0, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: verbs;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IB [48]

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: verbs;ofi_rxd, score = 5, pref = -2, FI_FORMAT_UNSPEC [32]

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 192.168.2.200

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 10.118.91.159

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1

provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://1177

provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://1177

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.2.200

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::526b:4bff:fefc:1342

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 10.118.91.159

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::be97:e1ff:fe9d:7caa

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1

provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://1177

provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://1177

Required minimum FI_VERSION: 10005, current version: 10015

==== Capability set configuration ====

libfabric provider: sockets - 192.168.2.0/24

MPIDI_OFI_ENABLE_DATA: 1

MPIDI_OFI_ENABLE_AV_TABLE: 1

MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1

MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0

MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0

MPIDI_OFI_ENABLE_MR_ALLOCATED: 0

MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1

MPIDI_OFI_ENABLE_MR_PROV_KEY: 0

MPIDI_OFI_ENABLE_TAGGED: 1

MPIDI_OFI_ENABLE_AM: 1

MPIDI_OFI_ENABLE_RMA: 1

MPIDI_OFI_ENABLE_ATOMICS: 1

MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1

MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0

MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1

MPIDI_OFI_ENABLE_TRIGGERED: 0

MPIDI_OFI_ENABLE_HMEM: 0

MPIDI_OFI_NUM_AM_BUFFERS: 8

MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0

MPIDI_OFI_CONTEXT_BITS: 20

MPIDI_OFI_SOURCE_BITS: 0

MPIDI_OFI_TAG_BITS: 31

MPIDI_OFI_VNI_USE_DOMAIN: 1

MAXIMUM SUPPORTED RANKS: 4294967296

MAXIMUM TAG: 2147483648

==== Provider global thresholds ====

max_buffered_send: 255

max_buffered_write: 255

max_msg_size: 9223372036854775807

max_order_raw: -1

max_order_war: -1

max_order_waw: -1

tx_iov_limit: 8

rx_iov_limit: 8

rma_iov_limit: 8

max_mr_key_size: 8

==== Various sizes and limits ====

MPIDI_OFI_AM_MSG_HEADER_SIZE: 24

MPIDI_OFI_MAX_AM_HDR_SIZE: 255

sizeof(MPIDI_OFI_am_request_header_t): 416

sizeof(MPIDI_OFI_per_vci_t): 52480

MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024

MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384

==== OFI dynamic settings ====

num_vcis: 1

num_nics: 1

======================================

error checking    : enabled

QMPI              : disabled

debugger support  : disabled

thread level      : MPI_THREAD_SINGLE

thread CS         : per-vci

threadcomm        : enabled

==== data structure summary ====

sizeof(MPIR_Comm): 1808

sizeof(MPIR_Request): 512

sizeof(MPIR_Datatype): 280

================================

Hello world from process 0 of 5

Hello world from process 2 of 5

Hello world from process 1 of 5

Hello world from process 4 of 5

Hello world from process 3 of 5







From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 7:22 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Subject: Re: Mpich over RDMA sample

It’s possible that the native path for RMA on verbs are turned off due to verbs does not meet MPI’s full atomic ordering semantics. We’ll check and follow up.



--
Hui Zhou





From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 3:59 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

I think image is not gone : https: //ibb. co/RDGnJfy From: Niyaz Murshed <Niyaz. Murshed@ arm. com> Date: Wednesday, June 12, 2024 at 3: 56 PM To: Zhou, Hui <zhouh@ anl. gov>, discuss@ mpich. org <discuss@ mpich. org> Cc: nd <nd@ arm. com>

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

I think image is not gone : https://urldefense.us/v3/__https://ibb.co/RDGnJfy__;!!G_uCfscf7eWS!cn5cXL4QAe2E8ZgxXqVCAl7c3CheHJj61w7oq7Xjax_lCc5i0vxNUMPTygWv434dybYxB1kttyfXx1PX6pM$ <https://urldefense.us/v3/__https:/ibb.co/RDGnJfy__;!!G_uCfscf7eWS!YH6L3OHfPHd8aABguKjC1E2ZLYeI_yAUdgQlvdFFY5HHl8tDGe186dWz5LcggM9kcgEiwET4m3XOQ-HTMpo$>



From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 3:56 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Thank you Hui for the reply.



I was expecting to see RoCE messages in the packet capture.



When I use verbs on libfabric sample tests, I see the RoCE messages as below :

[cid:image001.png at 01DABCE1.0D6BE810]







From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 3:41 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Niyaz,



ofi_rxm:verbs is the correct provider. The ofi_rxm provides the verbs provider additional messaging semantics that are needed to be used in MPI.

--
Hui Zhou





From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 2:54 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Thank you for the reply. I did that; however, it selects ofi_rxm: verbs libfabric: 412098: 1718221892: : ofi_rxm: av: ofi_av_insert_addr(): 313<info> fi_addr: 1 libfabric: 412099: 1718221892: : ofi_rxm: av: ofi_av_insert_addr(): 313<info> fi_addr: 

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

Thank you for the reply.



I did that; however, it selects ofi_rxm:verbs



libfabric:412098:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1

libfabric:412099:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1



options:

  backend:        cpu

  iters:          16

  warmup_iters:   16

  cache:          1

  min_elem_count: 1

  max_elem_count: 1

  elem_counts:    [1]

  validate:       last

  window_size:    64

#------------------------------------------------------------

# Benchmarking: Bandwidth

# #processes: 2

#------------------------------------------------------------



        #bytes      #repetitions        Mbytes/sec

             4                16              0.02



# All done



libfabric:412098:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread

libfabric:412099:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread







As per https://urldefense.us/v3/__https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/ofi-providers-support.html__;!!G_uCfscf7eWS!cn5cXL4QAe2E8ZgxXqVCAl7c3CheHJj61w7oq7Xjax_lCc5i0vxNUMPTygWv434dybYxB1kttyfXaaWkHYw$ <https://urldefense.us/v3/__https:/www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/ofi-providers-support.html__;!!G_uCfscf7eWS!YajPk9G-sPCEDt44nrZQrYG8r7V_s953AsKjQ4w_vW5OzcjJXfdGHOINW-PEYIw-IcISZSCw1xIsIoJ-dbE$> , we need to add FI_PROVIDER=^ofi_rxm , but if do that,  it moves back to sockets provider.

Is here a way to combine ^ofi_rxm and verbs



From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 2:41 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Niyaz,



All you need to do is to set an environment variable `FI_PROVIDER=verbs`.



--
Hui Zhou





From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 1:23 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1” so that it selects verbs. I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd

When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1”  so that it selects verbs.

I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force the selection of verbs.



From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 1:15 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample

Libfabric support multiple providers. Sounds like it was selecting the sockets or tcp provider rather than a provider that support RoCE. I am not exactly sure whether the verbs provider will do that. If you can confirm the provider using libfabric tests, then you can try forcing MPICH to use that provider by setting the FI_PROVIDER environment variable.



--
Hui Zhou





From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Wednesday, June 12, 2024 at 9:03 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: [mpich-discuss] Mpich over RDMA sample

Hello, I am trying to learn about MPICH and its performance over RDMA. I am using libfabric and installed mpich using the below configure. ./configure --prefix=/opt/mpich/ --with-ofi=/opt/libfabric/ When I run any applications between 2 directly

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.



ZjQcmQRYFpfptBannerEnd



Hello,



I am trying to learn about MPICH and its performance over RDMA.

I am using libfabric and installed mpich using the below configure.



./configure --prefix=/opt/mpich/  --with-ofi=/opt/libfabric/



When I run any applications between 2 directly connected servers having Mellanox NICs, I see that communication is happening over tcp and not over RoCE.

Is there any way to test commination over RoCE ?



For eg. I was able to test it for libfabric using the below sample that comes along with libfabric to test RMA.

Is there something similar for MPICH ? or use the current sample to use RoCE by some parameter?



Server :

fi_rma_bw -s   192.168.1.100  -e msg   -d mlx5_1 -S 1024 -I 1

Client :

fi_rma_bw -s   192.168.1.200  -e msg   -d mlx5_3  192.168.1.100  -S 1024 -I 1





Regards,

Niyaz


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240618/06589d4b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 88812 bytes
Desc: image001.png
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240618/06589d4b/attachment-0001.png>


More information about the discuss mailing list