[mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Zhou, Hui zhouh at anl.gov
Tue Feb 1 18:16:12 CST 2022


Hi Edric,

After some pondering, we are closing https://github.com/pmodels/mpich/issues/5775 as wontfix​. For the following reasons:


  *   Cancelling send is deprecated in MPI-4.
  *   The potential fix will be very messy due to current design.
  *   On the other hand, it is probably easier to work around in application.

Example workaround could be to replace MPI_Issend​ with `MPI_Isend` plus separate MPI_Irecv​ for acknowledgement. You only need do this for the MPI_Issend​ that you may cancel, and I think it is a small inconvenience for more predictable behavior. Internally an MPI_Issend​ is not implemented as an MPI_Isend + MPI_Irecv​ to be more efficient and to exploit hardware acceleration where we can, but that also made cancelling it very complicated.

If you have strong reasons to fix this issue, for example, no feasible work around, let's discuss.

--
Hui Zhou

________________________________
From: Zhou, Hui <zhouh at anl.gov>
Sent: Wednesday, January 19, 2022 12:28 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Edric Ellis <eellis at mathworks.com>
Subject: Re: MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Thanks for reporting. We'll fix this. Tracking issue: https://github.com/pmodels/mpich/issues/5775
[https://opengraph.githubassets.com/abeb43664f034115c44e99f3e273e49dbfdeb9ebcc478dd9ede9ca4d439f78ab/pmodels/mpich/issues/5775]<https://github.com/pmodels/mpich/issues/5775>
ch4: MPI_Cancel stalls MPI_Wait · Issue #5775 · pmodels/mpich<https://github.com/pmodels/mpich/issues/5775>
Reported on discuss mailinglist: ---- quote ---- Hi, Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an MPI_Request that has been MPI_Cancelled never compl...
github.com

________________________________
From: Edric Ellis via discuss <discuss at mpich.org>
Sent: Wednesday, January 19, 2022 11:21 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Edric Ellis <eellis at mathworks.com>
Subject: [mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Hi,

Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an MPI_Request that has been MPI_Cancelled never completes (it does when using ch3). (The documentation for MPI_Cancel states "If a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function)")

Here's a simple example:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

void check(int const value) {
    if (value != MPI_SUCCESS) {
        fprintf(stderr, "Failed.\n");
        exit(1);
    }
}

int main(int argc, char** argv) {
    MPI_Request r1;
    int payload = 42;
    int result;

    check(MPI_Init(0,0));
    check(MPI_Issend(&payload, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r1));
    check(MPI_Test(&r1, &result, MPI_STATUS_IGNORE));
    fprintf(stdout, "MPI_Test result: %d\n", result);
    check(MPI_Cancel(&r1));
    check(MPI_Wait(&r1, MPI_STATUS_IGNORE));
    MPI_Finalize();
    return 0;
}

This stalls in MPI_Wait when executed using "mpiexec -n 1 ./a.out".

Cheers,
Edric.

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220202/249c5ad4/attachment.html>


More information about the discuss mailing list