[mpich-discuss] Iprobe does not find all messages.
Hudson, Stephen Tobias P
shudson at anl.gov
Mon Jun 28 16:30:17 CDT 2021
Hi,
TLDR: Why does iprobe not pick up messages that should be there?
In this use-case, a manager is calling iprobe (via mpi4py) to look for messages received from various workers.
In the receive function (1.), there is a loop over workers doing an iprobe for each worker, and only if it loops round them all receiving nothing does it come out. Each worker may have sent several messages.
loop:
1. receive_func (loop over all workers doing iprobe - until none have anything)
2. do some other stuff.
(actual code: https://github.com/Libensemble/libensemble/blob/main/libensemble/manager.py#L346)
The workers are sending with isend, and I know they have passed the isend call for all the required messages. Manager side, the iprobe loop does not necessarily pick up all messages in one call to receive_func (but if I replace each iprobe call with its own loop of several iprobes it does).
In my code, the iprobe is part of a loop which may occasionally spend some time in another function (2.), so I want to better understand why it may not pick up all the messages. Even if I put a delay of a few seconds in (2), it still has cycles that do not pick up the messages - as though a certain number of iprobes are required.
In particular, I want to understand whether the messages are present on the receiving end, maybe in the unexpected message buffer, but may not get picked up by the iprobe?
Or is there some delay on the sending end (I know from timing the code has passed the isend for all messages).
Also, as adding several iprobes (in their own sub-loop) always finds the messages, if the delay is not receiver side, then it must be doing some handshake?
I have seen answers online, which suggest that this is expected behavior, but I wonder if there is a fairly simple explanation as to why iprobe does not pick everything up? If its too complicated, then okay.
I am running on a Ubuntu laptop with $ mpirun --version
HYDRA build details:
Version: 3.2.1
Thanks,
Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210628/509c8e9b/attachment.html>
More information about the discuss
mailing list