[mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3
Jeff Hammond
jeff.science at gmail.com
Tue Feb 1 23:06:07 CST 2022
Instead of stalling, it would be better to detect send cancel and throw an
error.
Jeff
On Wed, Feb 2, 2022 at 2:16 AM Zhou, Hui via discuss <discuss at mpich.org>
wrote:
> Hi Edric,
>
> After some pondering, we are closing
> https://github.com/pmodels/mpich/issues/5775 as wontfix. For the
> following reasons:
>
>
> - Cancelling send is deprecated in MPI-4.
> - The potential fix will be very messy due to current design.
> - On the other hand, it is probably easier to work around in
> application.
>
> Example workaround could be to replace MPI_Issend with `MPI_Isend` plus
> separate MPI_Irecv for acknowledgement. You only need do this for the
> MPI_Issend that you may cancel, and I think it is a small inconvenience
> for more predictable behavior. Internally an MPI_Issend is not
> implemented as an MPI_Isend + MPI_Irecv to be more efficient and to
> exploit hardware acceleration where we can, but that also made cancelling
> it very complicated.
>
> If you have strong reasons to fix this issue, for example, no feasible
> work around, let's discuss.
>
> --
> Hui Zhou
>
> ------------------------------
> *From:* Zhou, Hui <zhouh at anl.gov>
> *Sent:* Wednesday, January 19, 2022 12:28 PM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* Re: MPI_Cancel + MPI_Wait stalls when using ch4, not when
> using ch3
>
> Thanks for reporting. We'll fix this. Tracking issue:
> https://github.com/pmodels/mpich/issues/5775
> <https://github.com/pmodels/mpich/issues/5775>
> ch4: MPI_Cancel stalls MPI_Wait · Issue #5775 · pmodels/mpich
> <https://github.com/pmodels/mpich/issues/5775>
> Reported on discuss mailinglist: ---- quote ---- Hi, Running one of our
> test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an
> MPI_Request that has been MPI_Cancelled never compl...
> github.com
>
> ------------------------------
> *From:* Edric Ellis via discuss <discuss at mpich.org>
> *Sent:* Wednesday, January 19, 2022 11:21 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* [mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4,
> not when using ch3
>
> Hi,
>
> Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice
> that MPI_Wait on an MPI_Request that has been MPI_Cancelled never completes
> (it does when using ch3). (The documentation for MPI_Cancel states "If a
> communication is marked for cancellation, then a MPI_WAIT call for that
> communication is guaranteed to return, irrespective of the activities of
> other processes (i.e., MPI_WAIT behaves as a local function)")
>
> Here's a simple example:
>
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> void check(int const value) {
> if (value != MPI_SUCCESS) {
> fprintf(stderr, "Failed.\n");
> exit(1);
> }
> }
>
> int main(int argc, char** argv) {
> MPI_Request r1;
> int payload = 42;
> int result;
>
> check(MPI_Init(0,0));
> check(MPI_Issend(&payload, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r1));
> check(MPI_Test(&r1, &result, MPI_STATUS_IGNORE));
> fprintf(stdout, "MPI_Test result: %d\n", result);
> check(MPI_Cancel(&r1));
> check(MPI_Wait(&r1, MPI_STATUS_IGNORE));
> MPI_Finalize();
> return 0;
> }
>
> This stalls in MPI_Wait when executed using "mpiexec -n 1 ./a.out".
>
> Cheers,
> Edric.
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220202/842f6d99/attachment-0001.html>
More information about the discuss
mailing list