[mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod

Amit Ruhela aruhela at tacc.utexas.edu
Thu Apr 8 10:24:25 CDT 2021

We have seen this issue with Intel MPI as well and the solution was to set the following two variables.

The total length is 41 and can be adjusted for desired ranks and tags count.

Amit Ruhela
From: Zhou, Hui via discuss <discuss at mpich.org>
Sent: Thursday, April 8, 2021 10:18 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Zhou, Hui <zhouh at anl.gov>
Subject: Re: [mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod

Hi Min,

I think we can do something about it. We’ll follow-up when we have updates.

Hui Zhou

From: M Xie via discuss <discuss at mpich.org>
Date: Thursday, April 8, 2021 at 12:30 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: M Xie <xmxmxie at gmail.com>
Subject: [mpich-discuss] Supporting > 64K ranks in CH4/UCX netmod


I am using MPICH on CH4/UCX netmod, the version is mpich-3.4.1.

I noticed that there is a configure parameter "--with-ch4-rank-bits" which can set the value of CH4_RANK_BITS, but seems CH4_RANK_BITS is not used in the code.

And I also find in the netmod/ucx/ucx_impl.h, _UCX_init_tag()/_UCX_recv_tag() use only 16 bits to set MPI rank in the ucp_tag, but this cannot differentiate correct ucp_tag when MPI ranks exceed 64K.

In Open MPI, 20 bits is used in pml/ucx module to set rank in ucp_tag, 20 bits for context, 24 bits for MPI tag, thus the maximum ranks in Open MPI can be 1M.

Is there any plan to support > 64K ranks in MPICH/CH4/UCX?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210408/e565781f/attachment-0001.html>

More information about the discuss mailing list