[mpich-discuss] Mpich over RDMA sample

Niyaz Murshed Niyaz.Murshed at arm.com
Wed Jun 12 14:54:17 CDT 2024


Thank you for the reply.

I did that; however, it selects ofi_rxm:verbs


libfabric:412098:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1

libfabric:412099:1718221892::ofi_rxm:av:ofi_av_insert_addr():313<info> fi_addr: 1



options:

  backend:        cpu

  iters:          16

  warmup_iters:   16

  cache:          1

  min_elem_count: 1

  max_elem_count: 1

  elem_counts:    [1]

  validate:       last

  window_size:    64

#------------------------------------------------------------

# Benchmarking: Bandwidth

# #processes: 2

#------------------------------------------------------------



        #bytes      #repetitions        Mbytes/sec

             4                16              0.02



# All done



libfabric:412098:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread

libfabric:412099:1718221892::ofi_rxm:ep_ctrl:rxm_stop_listen():864<info> stopping CM thread



As per https://urldefense.us/v3/__https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/ofi-providers-support.html__;!!G_uCfscf7eWS!YxxBsw6YdMeQc_0SyuZPxo_5NIp88gPA6GA2TxZGkL-SDnfcL2coYdbh2UaXO8i8LCFPVHvQHLXI49R4dR8$  , we need to add FI_PROVIDER=^ofi_rxm , but if do that,  it moves back to sockets provider.
Is here a way to combine ^ofi_rxm and verbs

From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 2:41 PM
To: Niyaz Murshed <Niyaz.Murshed at arm.com>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Niyaz,

All you need to do is to set an environment variable `FI_PROVIDER=verbs`.

--
Hui Zhou


From: Niyaz Murshed <Niyaz.Murshed at arm.com>
Date: Wednesday, June 12, 2024 at 1:23 PM
To: Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
Cc: nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1” so that it selects verbs. I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
When testing with Libfabric, verbs provider is selected. I did have to use “-e msg -d mlx5_1”  so that it selects verbs.
I was checking if there is anything like that for mpich sample tests. Else might need to do some hack in the code to force the selection of verbs.

From: Zhou, Hui <zhouh at anl.gov>
Date: Wednesday, June 12, 2024 at 1:15 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: Re: Mpich over RDMA sample
Libfabric support multiple providers. Sounds like it was selecting the sockets or tcp provider rather than a provider that support RoCE. I am not exactly sure whether the verbs provider will do that. If you can confirm the provider using libfabric tests, then you can try forcing MPICH to use that provider by setting the FI_PROVIDER environment variable.

--
Hui Zhou


From: Niyaz Murshed via discuss <discuss at mpich.org>
Date: Wednesday, June 12, 2024 at 9:03 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Niyaz Murshed <Niyaz.Murshed at arm.com>, nd <nd at arm.com>
Subject: [mpich-discuss] Mpich over RDMA sample
Hello, I am trying to learn about MPICH and its performance over RDMA. I am using libfabric and installed mpich using the below configure. ./configure --prefix=/opt/mpich/ --with-ofi=/opt/libfabric/ When I run any applications between 2 directly
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hello,

I am trying to learn about MPICH and its performance over RDMA.
I am using libfabric and installed mpich using the below configure.

./configure --prefix=/opt/mpich/  --with-ofi=/opt/libfabric/

When I run any applications between 2 directly connected servers having Mellanox NICs, I see that communication is happening over tcp and not over RoCE.
Is there any way to test commination over RoCE ?

For eg. I was able to test it for libfabric using the below sample that comes along with libfabric to test RMA.
Is there something similar for MPICH ? or use the current sample to use RoCE by some parameter?

Server :
fi_rma_bw -s   192.168.1.100  -e msg   -d mlx5_1 -S 1024 -I 1
Client :
fi_rma_bw -s   192.168.1.200  -e msg   -d mlx5_3  192.168.1.100  -S 1024 -I 1


Regards,
Niyaz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240612/7b337251/attachment-0001.html>


More information about the discuss mailing list