[mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod

Amit Ruhela aruhela at tacc.utexas.edu
Thu Apr 8 10:24:25 CDT 2021


We have seen this issue with Intel MPI as well and the solution was to set the following two variables.
export MPIR_CVAR_CH4_OFI_RANK_BITS=17
export MPIR_CVAR_CH4_OFI_TAG_BITS=$((41 - MPIR_CVAR_CH4_OFI_RANK_BITS))

The total length is 41 and can be adjusted for desired ranks and tags count.

Thanks,
Amit Ruhela
________________________________
From: Zhou, Hui via discuss <discuss at mpich.org>
Sent: Thursday, April 8, 2021 10:18 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Zhou, Hui <zhouh at anl.gov>
Subject: Re: [mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod


Hi Min,



I think we can do something about it. We’ll follow-up when we have updates.



--
Hui Zhou





From: M Xie via discuss <discuss at mpich.org>
Date: Thursday, April 8, 2021 at 12:30 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: M Xie <xmxmxie at gmail.com>
Subject: [mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod

Hi,



I am using MPICH on CH4/UCX netmod, the version is mpich-3.4.1.



I noticed that there is a configure parameter "--with-ch4-rank-bits" which can set the value of CH4_RANK_BITS, but seems CH4_RANK_BITS is not used in the code.



And I also find in the netmod/ucx/ucx_impl.h, _UCX_init_tag()/_UCX_recv_tag() use only 16 bits to set MPI rank in the ucp_tag, but this cannot differentiate correct ucp_tag when MPI ranks exceed 64K.



In Open MPI, 20 bits is used in pml/ucx module to set rank in ucp_tag, 20 bits for context, 24 bits for MPI tag, thus the maximum ranks in Open MPI can be 1M.



Is there any plan to support > 64K ranks in MPICH/CH4/UCX?



Thanks.



Min
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210408/e565781f/attachment-0001.html>


More information about the discuss mailing list