[mpich-discuss] Differences between ch3:nemesis and ch4:ofi:tcp with MPI_Barrier before completion of MPI_Isend
Raffenetti, Ken
raffenet at anl.gov
Thu Apr 25 13:00:13 CDT 2024
Hi Edric,
I don’t see anything wrong in your pseudo code. I believe it is a correct pattern. I ran some experiments myself on some local machines and could not cause a hang, so if you come up with a reproducer please send it along.
Ken
From: Edric Ellis via discuss <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Wednesday, April 24, 2024 at 3:05 AM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: Edric Ellis <eellis at mathworks.com>
Subject: [mpich-discuss] Differences between ch3:nemesis and ch4:ofi:tcp with MPI_Barrier before completion of MPI_Isend
I'm trying to understand if a change in behaviour I'm seeing is expected or not. My code initiates an MPI_Isend on rank==1, and before waiting for completion of that send, all ranks perform an MPI_Barrier. This works fine on ch3: nemesis. It
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
I'm trying to understand if a change in behaviour I'm seeing is expected or not. My code initiates an MPI_Isend on rank==1, and before waiting for completion of that send, all ranks perform an MPI_Barrier. This works fine on ch3:nemesis. It works fine on ch4:ofi when the message is small (presumably using the "eager" protocol). When using ch4:ofi (the embedded tcp provider) and the message is large (presumably switching to "rendezvous" protocol), rank 0 never leaves the MPI_Barrier call. (I think the SHM piece of ch4 does not show the problem)
Should this work? I cannot find anything in the MPI standard that says it should not, but perhaps I'm not looking in the right place.
I'm using mpich-4.1.2 in both cases, either in "--with-device=ch3:nemesis" mode or "--with-libfabric=embedded --with-device=ch4:ofi:tcp".
Here's a sketch of the problematic section of code, I'll attempt to attach a full reproduction (but I'm not sure if that works?)
// setup code...
if (rank == 1) {
MPI_Isend(data, count, MPI_INT, 0, TAG, comm, &req);
} else {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
MPI_Barrier(comm);
MPI_Barrier(comm);
// cleanup code, receive the message etc...
Cheers,
Edric.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240425/b8268193/attachment.html>
More information about the discuss
mailing list