[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used

Jeff Hammond jeff.science at gmail.com
Thu Jun 4 09:21:53 CDT 2015


I can't tell for sure if this is a correct program, but multiple
members of the MPI Forum suggested it is.

If it is a correct program, it appears to expose a bug in MPICH,
because the MPI_Waitall hangs.

Thanks,

Jeff

$ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out

$ mpichversion
MPICH Version:    3.2b1
MPICH Release date: unreleased development copy
MPICH Device:    ch3:nemesis
MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
F77=gfortran-4.9 --enable-cxx --enable-fortran
--enable-threads=runtime --enable-g=dbg --with-pm=hydra
--prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
--enable-static --enable-shared
MPICH CC: gcc-4.9    -g -O2
MPICH CXX: g++-4.9   -g -O2
MPICH F77: gfortran-4.9   -g -O2
MPICH FC: gfortran-4.9   -g -O2


#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

const int n=1000;

int main(void)
{
    MPI_Init(NULL,NULL);

    int size, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (size<2) {
        printf("You must use 2 or more processes!\n");
        MPI_Finalize();
        exit(1);
    }

    MPI_Request reqs[2*n];

    int target = (rank+1)%size;
    for (int i=0; i<n; i++) {
        MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
    }

    srand((unsigned)(rank+MPI_Wtime()));
    int slot = rand()%n;
    printf("Cancelling send %d.\n", slot);
    MPI_Cancel(&reqs[slot]);

#if 1
    MPI_Barrier(MPI_COMM_WORLD);
#endif

    int origin = (rank==0) ? (size-1) : (rank-1);
    for (int i=0; i<n; i++) {
        MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
    }

    MPI_Status stats[2*n];
    MPI_Waitall(2*n,reqs,stats);

    for (int i=0; i<n; i++) {
        int flag;
        MPI_Test_cancelled(&(stats[i]),&flag);
        if (flag) {
            printf("Status %d indicates cancel was successful.\n", i);
        }
    }

    MPI_Finalize();
    return 0;
}


-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/


More information about the devel mailing list