[mpich-discuss] Iprobe does not find all messages.

Thakur, Rajeev thakur at anl.gov
Mon Jun 28 21:13:19 CDT 2021


I guess the reason is that iprobe does not find a message until some MPI function causes progress on the communication. Just a delay will not cause progress.

Rajeev

From: "Hudson, Stephen Tobias P via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Monday, June 28, 2021 at 4:30 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Hudson, Stephen Tobias P" <shudson at anl.gov>
Subject: [mpich-discuss] Iprobe does not find all messages.

Hi,

TLDR: Why does iprobe not pick up messages that should be there?

In this use-case, a manager is calling iprobe  (via mpi4py) to look for messages received from various workers.

In the receive function (1.), there is a loop over workers doing an iprobe for each worker, and only if it loops round them all receiving nothing does it come out. Each worker may have sent several messages.

loop:
   1. receive_func (loop over all workers doing iprobe - until none have anything)
   2. do some other stuff.

(actual code: https://github.com/Libensemble/libensemble/blob/main/libensemble/manager.py#L346)

The workers are sending with isend, and I know they have passed the isend call for all the required messages. Manager side, the iprobe loop does not necessarily pick up all messages in one call to receive_func (but if I replace each iprobe call with its own loop of several iprobes it does).

In my code, the iprobe is part of a loop which may occasionally spend some time in another function (2.), so I want to better understand why it may not pick up all the messages. Even if I put a delay of a few seconds in (2), it still has cycles that do not pick up the messages - as though a certain number of iprobes are required.

In particular, I want to understand whether the messages are present on the receiving end, maybe in the unexpected message buffer, but may not get picked up by the iprobe?
Or is there some delay on the sending end (I know from timing the code has passed the isend for all messages).

Also, as adding several iprobes (in their own sub-loop) always finds the messages, if the delay is not receiver side, then it must be doing some handshake?

I have seen answers online, which suggest that this is expected behavior, but I wonder if there is a fairly simple explanation as to why iprobe does not pick everything up? If its too complicated, then okay.

I am running on a Ubuntu laptop with $ mpirun --version
HYDRA build details:
    Version:                                 3.2.1

Thanks,

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210629/1f7059db/attachment.html>


More information about the discuss mailing list