[mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Edric Ellis eellis at mathworks.com
Wed Feb 2 02:29:41 CST 2022


Ok, thanks for explaining that. I think it should be possible for us to arrange things to avoid calling MPI_Cancel on send requests. We do this currently because we’re using MPI communication in an interactive environment where the user can tell us to stop at any time – the reproduction steps I posted were rather simplified compared to the actual case we were hitting. We already have mechanisms in place to “flush” messages at the receiving side when necessary, so that covers part of what we’ll need.

Cheers,
Edric.

From: Zhou, Hui via discuss <discuss at mpich.org>
Sent: 02 February 2022 00:16
To: discuss at mpich.org
Cc: Zhou, Hui <zhouh at anl.gov>; Edric Ellis <eellis at mathworks.com>
Subject: Re: [mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Hi Edric,

After some pondering, we are closing https://github.com/pmodels/mpich/issues/5775<https://github.com/pmodels/mpich/issues/5775> as wontfix. For the following reasons:


  *   Cancelling send is deprecated in MPI-4.
  *   The potential fix will be very messy due to current design.
  *   On the other hand, it is probably easier to work around in application.
Example workaround could be to replace MPI_Issend​ with `MPI_Isend` plus separate MPI_Irecv​ for acknowledgement. You only need do this for the MPI_Issend​ that you may cancel, and I think it is a small inconvenience for more predictable behavior. Internally an MPI_Issend​ is not implemented as an MPI_Isend + MPI_Irecv​ to be more efficient and to exploit hardware acceleration where we can, but that also made cancelling it very complicated.

If you have strong reasons to fix this issue, for example, no feasible work around, let's discuss.

--
Hui Zhou

________________________________
From: Zhou, Hui <zhouh at anl.gov<mailto:zhouh at anl.gov>>
Sent: Wednesday, January 19, 2022 12:28 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Edric Ellis <eellis at mathworks.com<mailto:eellis at mathworks.com>>
Subject: Re: MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Thanks for reporting. We'll fix this. Tracking issue: https://github.com/pmodels/mpich/issues/5775<https://github.com/pmodels/mpich/issues/5775>
[https://opengraph.githubassets.com/abeb43664f034115c44e99f3e273e49dbfdeb9ebcc478dd9ede9ca4d439f78ab/pmodels/mpich/issues/5775]<https://github.com/pmodels/mpich/issues/5775>
ch4: MPI_Cancel stalls MPI_Wait   Issue #5775   pmodels/mpich<https://github.com/pmodels/mpich/issues/5775>
Reported on discuss mailinglist: ---- quote ---- Hi, Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an MPI_Request that has been MPI_Cancelled never compl...
github.com

________________________________
From: Edric Ellis via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Sent: Wednesday, January 19, 2022 11:21 AM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Edric Ellis <eellis at mathworks.com<mailto:eellis at mathworks.com>>
Subject: [mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Hi,

Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an MPI_Request that has been MPI_Cancelled never completes (it does when using ch3). (The documentation for MPI_Cancel states "If a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function)")

Here's a simple example:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

void check(int const value) {
    if (value != MPI_SUCCESS) {
        fprintf(stderr, "Failed.\n");
        exit(1);
    }
}

int main(int argc, char** argv) {
    MPI_Request r1;
    int payload = 42;
    int result;

    check(MPI_Init(0,0));
    check(MPI_Issend(&payload, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r1));
    check(MPI_Test(&r1, &result, MPI_STATUS_IGNORE));
    fprintf(stdout, "MPI_Test result: %d\n", result);
    check(MPI_Cancel(&r1));
    check(MPI_Wait(&r1, MPI_STATUS_IGNORE));
    MPI_Finalize();
    return 0;
}

This stalls in MPI_Wait when executed using "mpiexec -n 1 ./a.out".

Cheers,
Edric.

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss<https://lists.mpich.org/mailman/listinfo/discuss>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220202/bf9a6582/attachment.html>


More information about the discuss mailing list