[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used
Halim Amer
aamer at anl.gov
Thu Jun 4 11:49:12 CDT 2015
Hi Jeff,
I don't think it is a correct program. If the send is correctly canceled
then the origin has to satisfy the destination with another send. The
hang is an expected result.
This is what the standard says (P102):
"...or that the send is successfully cancelled, in which case no part of
the message was received at the destination. Then, any matching receive
has to be satisfied by another send."
--Halim
Abdelhalim Amer (Halim)
Postdoctoral Appointee
MCS Division
Argonne National Laboratory
On 6/4/15 9:21 AM, Jeff Hammond wrote:
> I can't tell for sure if this is a correct program, but multiple
> members of the MPI Forum suggested it is.
>
> If it is a correct program, it appears to expose a bug in MPICH,
> because the MPI_Waitall hangs.
>
> Thanks,
>
> Jeff
>
> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>
> $ mpichversion
> MPICH Version: 3.2b1
> MPICH Release date: unreleased development copy
> MPICH Device: ch3:nemesis
> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
> F77=gfortran-4.9 --enable-cxx --enable-fortran
> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
> --enable-static --enable-shared
> MPICH CC: gcc-4.9 -g -O2
> MPICH CXX: g++-4.9 -g -O2
> MPICH F77: gfortran-4.9 -g -O2
> MPICH FC: gfortran-4.9 -g -O2
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <mpi.h>
>
> const int n=1000;
>
> int main(void)
> {
> MPI_Init(NULL,NULL);
>
> int size, rank;
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> if (size<2) {
> printf("You must use 2 or more processes!\n");
> MPI_Finalize();
> exit(1);
> }
>
> MPI_Request reqs[2*n];
>
> int target = (rank+1)%size;
> for (int i=0; i<n; i++) {
> MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
> }
>
> srand((unsigned)(rank+MPI_Wtime()));
> int slot = rand()%n;
> printf("Cancelling send %d.\n", slot);
> MPI_Cancel(&reqs[slot]);
>
> #if 1
> MPI_Barrier(MPI_COMM_WORLD);
> #endif
>
> int origin = (rank==0) ? (size-1) : (rank-1);
> for (int i=0; i<n; i++) {
> MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
> }
>
> MPI_Status stats[2*n];
> MPI_Waitall(2*n,reqs,stats);
>
> for (int i=0; i<n; i++) {
> int flag;
> MPI_Test_cancelled(&(stats[i]),&flag);
> if (flag) {
> printf("Status %d indicates cancel was successful.\n", i);
> }
> }
>
> MPI_Finalize();
> return 0;
> }
>
>
More information about the devel
mailing list