[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used

Halim Amer aamer at anl.gov
Thu Jun 4 11:49:12 CDT 2015


Hi Jeff,

I don't think it is a correct program. If the send is correctly canceled 
then the origin has to satisfy the destination with another send. The 
hang is an expected result.

This is what the standard says (P102):

"...or that the send is successfully cancelled, in which case no part of 
the message was received at the destination. Then, any matching receive 
has to be satisfied by another send."

--Halim

Abdelhalim Amer (Halim)
Postdoctoral Appointee
MCS Division
Argonne National Laboratory

On 6/4/15 9:21 AM, Jeff Hammond wrote:
> I can't tell for sure if this is a correct program, but multiple
> members of the MPI Forum suggested it is.
>
> If it is a correct program, it appears to expose a bug in MPICH,
> because the MPI_Waitall hangs.
>
> Thanks,
>
> Jeff
>
> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>
> $ mpichversion
> MPICH Version:    3.2b1
> MPICH Release date: unreleased development copy
> MPICH Device:    ch3:nemesis
> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
> F77=gfortran-4.9 --enable-cxx --enable-fortran
> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
> --enable-static --enable-shared
> MPICH CC: gcc-4.9    -g -O2
> MPICH CXX: g++-4.9   -g -O2
> MPICH F77: gfortran-4.9   -g -O2
> MPICH FC: gfortran-4.9   -g -O2
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <mpi.h>
>
> const int n=1000;
>
> int main(void)
> {
>      MPI_Init(NULL,NULL);
>
>      int size, rank;
>      MPI_Comm_size(MPI_COMM_WORLD, &size);
>      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>      if (size<2) {
>          printf("You must use 2 or more processes!\n");
>          MPI_Finalize();
>          exit(1);
>      }
>
>      MPI_Request reqs[2*n];
>
>      int target = (rank+1)%size;
>      for (int i=0; i<n; i++) {
>          MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
>      }
>
>      srand((unsigned)(rank+MPI_Wtime()));
>      int slot = rand()%n;
>      printf("Cancelling send %d.\n", slot);
>      MPI_Cancel(&reqs[slot]);
>
> #if 1
>      MPI_Barrier(MPI_COMM_WORLD);
> #endif
>
>      int origin = (rank==0) ? (size-1) : (rank-1);
>      for (int i=0; i<n; i++) {
>          MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
>      }
>
>      MPI_Status stats[2*n];
>      MPI_Waitall(2*n,reqs,stats);
>
>      for (int i=0; i<n; i++) {
>          int flag;
>          MPI_Test_cancelled(&(stats[i]),&flag);
>          if (flag) {
>              printf("Status %d indicates cancel was successful.\n", i);
>          }
>      }
>
>      MPI_Finalize();
>      return 0;
> }
>
>


More information about the devel mailing list