[mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Jeff Hammond jeff.science at gmail.com
Tue Feb 1 23:06:07 CST 2022


Instead of stalling, it would be better to detect send cancel and throw an
error.

Jeff

On Wed, Feb 2, 2022 at 2:16 AM Zhou, Hui via discuss <discuss at mpich.org>
wrote:

> Hi Edric,
>
> After some pondering, we are closing
> https://github.com/pmodels/mpich/issues/5775 as wontfix​. For the
> following reasons:
>
>
>    - Cancelling send is deprecated in MPI-4.
>    - The potential fix will be very messy due to current design.
>    - On the other hand, it is probably easier to work around in
>    application.
>
> Example workaround could be to replace MPI_Issend​ with `MPI_Isend` plus
> separate MPI_Irecv​ for acknowledgement. You only need do this for the
> MPI_Issend​ that you may cancel, and I think it is a small inconvenience
> for more predictable behavior. Internally an MPI_Issend​ is not
> implemented as an MPI_Isend + MPI_Irecv​ to be more efficient and to
> exploit hardware acceleration where we can, but that also made cancelling
> it very complicated.
>
> If you have strong reasons to fix this issue, for example, no feasible
> work around, let's discuss.
>
> --
> Hui Zhou
>
> ------------------------------
> *From:* Zhou, Hui <zhouh at anl.gov>
> *Sent:* Wednesday, January 19, 2022 12:28 PM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* Re: MPI_Cancel + MPI_Wait stalls when using ch4, not when
> using ch3
>
> Thanks for reporting. We'll fix this. Tracking issue:
> https://github.com/pmodels/mpich/issues/5775
> <https://github.com/pmodels/mpich/issues/5775>
> ch4: MPI_Cancel stalls MPI_Wait · Issue #5775 · pmodels/mpich
> <https://github.com/pmodels/mpich/issues/5775>
> Reported on discuss mailinglist: ---- quote ---- Hi, Running one of our
> test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an
> MPI_Request that has been MPI_Cancelled never compl...
> github.com
>
> ------------------------------
> *From:* Edric Ellis via discuss <discuss at mpich.org>
> *Sent:* Wednesday, January 19, 2022 11:21 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* [mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4,
> not when using ch3
>
> Hi,
>
> Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice
> that MPI_Wait on an MPI_Request that has been MPI_Cancelled never completes
> (it does when using ch3). (The documentation for MPI_Cancel states "If a
> communication is marked for cancellation, then a MPI_WAIT call for that
> communication is guaranteed to return, irrespective of the activities of
> other processes (i.e., MPI_WAIT behaves as a local function)")
>
> Here's a simple example:
>
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> void check(int const value) {
>     if (value != MPI_SUCCESS) {
>         fprintf(stderr, "Failed.\n");
>         exit(1);
>     }
> }
>
> int main(int argc, char** argv) {
>     MPI_Request r1;
>     int payload = 42;
>     int result;
>
>     check(MPI_Init(0,0));
>     check(MPI_Issend(&payload, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r1));
>     check(MPI_Test(&r1, &result, MPI_STATUS_IGNORE));
>     fprintf(stdout, "MPI_Test result: %d\n", result);
>     check(MPI_Cancel(&r1));
>     check(MPI_Wait(&r1, MPI_STATUS_IGNORE));
>     MPI_Finalize();
>     return 0;
> }
>
> This stalls in MPI_Wait when executed using "mpiexec -n 1 ./a.out".
>
> Cheers,
> Edric.
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220202/842f6d99/attachment-0001.html>


More information about the discuss mailing list