[mpich-discuss] Iprobe does not find all messages.
Hudson, Stephen Tobias P
shudson at anl.gov
Tue Jun 29 11:27:02 CDT 2021
Thanks Rajeev, is this going to be using rendezvous protocol, in which case the receiver has to receive an envelope and ping back. In that case, it may make sense to me? Does mpich have a way of controlling message sizes for eager/rendezvous as intel MPI does? I havn't found any environment variables.
Thanks,
Steve
________________________________
From: Thakur, Rajeev <thakur at anl.gov>
Sent: Monday, June 28, 2021 9:13 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Hudson, Stephen Tobias P <shudson at anl.gov>
Subject: Re: [mpich-discuss] Iprobe does not find all messages.
I guess the reason is that iprobe does not find a message until some MPI function causes progress on the communication. Just a delay will not cause progress.
Rajeev
From: "Hudson, Stephen Tobias P via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Monday, June 28, 2021 at 4:30 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Hudson, Stephen Tobias P" <shudson at anl.gov>
Subject: [mpich-discuss] Iprobe does not find all messages.
Hi,
TLDR: Why does iprobe not pick up messages that should be there?
In this use-case, a manager is calling iprobe (via mpi4py) to look for messages received from various workers.
In the receive function (1.), there is a loop over workers doing an iprobe for each worker, and only if it loops round them all receiving nothing does it come out. Each worker may have sent several messages.
loop:
1. receive_func (loop over all workers doing iprobe - until none have anything)
2. do some other stuff.
(actual code: https://github.com/Libensemble/libensemble/blob/main/libensemble/manager.py#L346)
The workers are sending with isend, and I know they have passed the isend call for all the required messages. Manager side, the iprobe loop does not necessarily pick up all messages in one call to receive_func (but if I replace each iprobe call with its own loop of several iprobes it does).
In my code, the iprobe is part of a loop which may occasionally spend some time in another function (2.), so I want to better understand why it may not pick up all the messages. Even if I put a delay of a few seconds in (2), it still has cycles that do not pick up the messages - as though a certain number of iprobes are required.
In particular, I want to understand whether the messages are present on the receiving end, maybe in the unexpected message buffer, but may not get picked up by the iprobe?
Or is there some delay on the sending end (I know from timing the code has passed the isend for all messages).
Also, as adding several iprobes (in their own sub-loop) always finds the messages, if the delay is not receiver side, then it must be doing some handshake?
I have seen answers online, which suggest that this is expected behavior, but I wonder if there is a fairly simple explanation as to why iprobe does not pick everything up? If its too complicated, then okay.
I am running on a Ubuntu laptop with $ mpirun --version
HYDRA build details:
Version: 3.2.1
Thanks,
Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210629/df7c5c94/attachment-0001.html>
More information about the discuss
mailing list