[mpich-discuss] Mpich over RDMA sample
Zhou, Hui
zhouh at anl.gov
Wed Jun 12 15:41:11 CDT 2024
Niyaz,
ofi_rxm:verbs is the correct provider. The ofi_rxm provides the verbs provider additional messaging semantics that are needed to be used in MPI.
--
Hui Zhou
From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 2:54 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Thank you for the reply. I did that; however, it selects ofi_rxm: verbs libfabric: 412098: 1718221892: : ofi_rxm: av: ofi_av_insert_addr(): 313<info> fi_addr: 1 libfabric: 412099: 1718221892: : ofi_rxm: av: ofi_av_insert_addr(): 313<info> fi_addr:
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Thank you for the reply.
I did that; however, it selects ofi_rxm:verbs
libfabric:412098:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1
libfabric:412099:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1
options:
backend: cpu
iters: 16
warmup_iters: 16
cache: 1
min_elem_count: 1
max_elem_count: 1
elem_counts: [1]
validate: last
window_size: 64
#------------------------------------------------------------
# Benchmarking: Bandwidth
# #processes: 2
#------------------------------------------------------------
#bytes #repetitions Mbytes/sec
4 16 0.02
# All done
libfabric:412098:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread
libfabric:412099:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread
As per https://urldefense.us/v3/__https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/ofi-providers-support.html__;!!G_uCfscf7eWS!cWrnAm56I8fpldwZuqZDCwFJyN4ir3pYek0UwldNmvWEveOh6jWnGtppa41DfLP-TSUmEC7-OWl1$ <https://urldefense.us/v3/__https:/www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/ofi-providers-support.html__;!!G_uCfscf7eWS!YajPk9G-sPCEDt44nrZQrYG8r7V_s953AsKjQ4w_vW5OzcjJXfdGHOINW-PEYIw-IcISZSCw1xIsIoJ-dbE$> , we need to add FI_PROVIDER=^ofi_rxm , but if do that, it moves back to sockets provider.
Is here a way to combine ^ofi_rxm and verbs
From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 2:41 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Niyaz,
All you need to do is to set an environment variable `FI_PROVIDER=verbs`.
--
Hui Zhou
From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 1:23 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1” so that it selects verbs. I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1” so that it selects verbs.
I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force the selection of verbs.
From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 1:15 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Libfabric support multiple providers. Sounds like it was selecting the sockets or tcp provider rather than a provider that support RoCE. I am not exactly sure whether the verbs provider will do that. If you can confirm the provider using libfabric tests, then you can try forcing MPICH to use that provider by setting the FI_PROVIDER environment variable.
--
Hui Zhou
From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Wednesday, June 12, 2024 at 9:03 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: [mpich-discuss] Mpich over RDMA sample
Hello, I am trying to learn about MPICH and its performance over RDMA. I am using libfabric and installed mpich using the below configure. ./configure --prefix=/opt/mpich/ --with-ofi=/opt/libfabric/ When I run any applications between 2 directly
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hello,
I am trying to learn about MPICH and its performance over RDMA.
I am using libfabric and installed mpich using the below configure.
./configure --prefix=/opt/mpich/ --with-ofi=/opt/libfabric/
When I run any applications between 2 directly connected servers having Mellanox NICs, I see that communication is happening over tcp and not over RoCE.
Is there any way to test commination over RoCE ?
For eg. I was able to test it for libfabric using the below sample that comes along with libfabric to test RMA.
Is there something similar for MPICH ? or use the current sample to use RoCE by some parameter?
Server :
fi_rma_bw -s 192.168.1.100 -e msg -d mlx5_1 -S 1024 -I 1
Client :
fi_rma_bw -s 192.168.1.200 -e msg -d mlx5_3 192.168.1.100 -S 1024 -I 1
Regards,
Niyaz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240612/22a36ef9/attachment-0001.html>
More information about the discuss
mailing list