[mpich-discuss] MPI_Cancel + MPI_Wait stalls when using ch4, not when using ch3

Edric Ellis eellis at mathworks.com
Wed Jan 19 11:21:24 CST 2022


Hi,

Running one of our test programs using MPICH 3.4.3 and ch4:ofi, I notice that MPI_Wait on an MPI_Request that has been MPI_Cancelled never completes (it does when using ch3). (The documentation for MPI_Cancel states "If a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function)")

Here's a simple example:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

void check(int const value) {
    if (value != MPI_SUCCESS) {
        fprintf(stderr, "Failed.\n");
        exit(1);
    }
}

int main(int argc, char** argv) {
    MPI_Request r1;
    int payload = 42;
    int result;

    check(MPI_Init(0,0));
    check(MPI_Issend(&payload, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &r1));
    check(MPI_Test(&r1, &result, MPI_STATUS_IGNORE));
    fprintf(stdout, "MPI_Test result: %d\n", result);
    check(MPI_Cancel(&r1));
    check(MPI_Wait(&r1, MPI_STATUS_IGNORE));
    MPI_Finalize();
    return 0;
}

This stalls in MPI_Wait when executed using "mpiexec -n 1 ./a.out". 

Cheers,
Edric.



More information about the discuss mailing list