[mpich-discuss] How locking on multi-VCI works

Guilherme Valarini guilherme.a.valarini at gmail.com
Mon Dec 6 12:16:03 CST 2021

Hello Zhou,

First of all, thanks for the help!

Let me explain a little bit my use case: my team and I have a distributed
and multithreaded event system implemented on top of MPI, where multiple
non-blocking MPI messages are exchanged between multiple nodes. Checking
our internal performance traces, we saw that there was some contention
happening at the MPI layer, especially when many threads were being used.

Digging a little bit deeper we found some studies explaining the current
multithread support of many MPI implementations and even limitations
regarding the standard itself, which might explain the problems we
encountered. Since each event is mapped to a unique TAG, that would be the
preferred mechanism of extracting network parallelism. But since we also
want to support other MPI implementations (e.g. openmpi), we think that
using multiple communicators might be a better option.

I was hoping that messages sent to two different processes but from the
same process through the same VCI would use two different locks at their
origin. But now I see that a VCI is directly mapped to a hardware context
of some sort. So it makes sense that the same lock would be shared between
the two previously described messages.

If you have any other general hints on how to better extract network
parallelism at the MPI level, I would be grateful. 😉

Note: Sorry for the duplicate. I forgot to reply to the mailing list as

Thaks again for the help,
Guilherme Valarini

Em seg., 6 de dez. de 2021 às 14:34, Zhou, Hui <zhouh at anl.gov> escreveu:

> The total number of VCIs are configured with --with-ch4-max-vcis=#​. The
> maximum is 64. The default used to be 1, but it is changed to 64 in 4.0b1
> release. There is also an option to control the vci assignment method:
> --enable-ch4-vci-method={communicator,tag,implicit}​. The default is
> communicator​, with which we assign vci to communicators in a round-robin
> fashion. If you create communicators consecutively, they are expected to
> have different VCIs. The other vci-method​s are at the experimental
> stage. If the communicator method is insufficient for your application, it
> may be worth a try.  We'd like to understand your use case better before
> pointing you that way.
> I am not exactly understanding your question. The vci locks are local
> process locks, so if you have N VCIs, you will have N channels for each
> process. With vci-method=communicator​, the vcis are one-to-one matched,
> i.e. rank 1 vci 1 only communicates to rank 2 (any ranks with the same
> communicator) vci 1.
> --
> Hui Zhou
> ------------------------------
> *From:* Guilherme Valarini via discuss <discuss at mpich.org>
> *Sent:* Monday, December 6, 2021 10:10 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Guilherme Valarini <guilherme.a.valarini at gmail.com>
> *Subject:* [mpich-discuss] How locking on multi-VCI works
> Hello everyone,
> I got one question regarding the multi-VCI support and possible locking
> contentions of MPICH on multi-threaded environments.
> I understand that there is a direct mapping between a VCI and a
> communicator, so global locking is avoided on a multi-threaded application.
> But I wanted to know: how do these VCIs work? When I have N VCIs, do I have
> N virtual channels per rank (thus, one global lock per VCI-rank pair) or
> only 2 channels at all (one lock per VCI)? I was wondering if, for example,
> two MPI_Sends targeting different ranks on the same comm might need to be
> synchronized using such a global lock.
> Thanks for the help!
> Guilherme Valarini
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20211206/df62c72b/attachment.html>

More information about the discuss mailing list