[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used

Jeff Hammond jeff.science at gmail.com
Thu Jun 4 13:08:52 CDT 2015

Yes, the vigorous Forum debate was what prompted my interest in what
is otherwise a useless function ;-)

On Thu, Jun 4, 2015 at 1:00 PM, Rob Latham <robl at mcs.anl.gov> wrote:
> On 06/04/2015 12:19 PM, Jeff Hammond wrote:
>> Thanks for pointing that out.  It runs correctly now.  Sorry for the
>> stupid question.
>  it just so happens, Jeff, that they've been spending a lot of time
> debugging cancel send operations for all our various devices and so "cancel
> semantics" are (moreso than usual) quite warm in the cache.
> ==rob
>> On Thu, Jun 4, 2015 at 11:49 AM, Halim Amer <aamer at anl.gov> wrote:
>>> Hi Jeff,
>>> I don't think it is a correct program. If the send is correctly canceled
>>> then the origin has to satisfy the destination with another send. The
>>> hang
>>> is an expected result.
>>> This is what the standard says (P102):
>>> "...or that the send is successfully cancelled, in which case no part of
>>> the
>>> message was received at the destination. Then, any matching receive has
>>> to
>>> be satisfied by another send."
>>> --Halim
>>> Abdelhalim Amer (Halim)
>>> Postdoctoral Appointee
>>> MCS Division
>>> Argonne National Laboratory
>>> On 6/4/15 9:21 AM, Jeff Hammond wrote:
>>>> I can't tell for sure if this is a correct program, but multiple
>>>> members of the MPI Forum suggested it is.
>>>> If it is a correct program, it appears to expose a bug in MPICH,
>>>> because the MPI_Waitall hangs.
>>>> Thanks,
>>>> Jeff
>>>> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>>>> $ mpichversion
>>>> MPICH Version:    3.2b1
>>>> MPICH Release date: unreleased development copy
>>>> MPICH Device:    ch3:nemesis
>>>> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
>>>> F77=gfortran-4.9 --enable-cxx --enable-fortran
>>>> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
>>>> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
>>>> --enable-static --enable-shared
>>>> MPICH CC: gcc-4.9    -g -O2
>>>> MPICH CXX: g++-4.9   -g -O2
>>>> MPICH F77: gfortran-4.9   -g -O2
>>>> MPICH FC: gfortran-4.9   -g -O2
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <mpi.h>
>>>> const int n=1000;
>>>> int main(void)
>>>> {
>>>>       MPI_Init(NULL,NULL);
>>>>       int size, rank;
>>>>       MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>       MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>       if (size<2) {
>>>>           printf("You must use 2 or more processes!\n");
>>>>           MPI_Finalize();
>>>>           exit(1);
>>>>       }
>>>>       MPI_Request reqs[2*n];
>>>>       int target = (rank+1)%size;
>>>>       for (int i=0; i<n; i++) {
>>>> MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
>>>>       }
>>>>       srand((unsigned)(rank+MPI_Wtime()));
>>>>       int slot = rand()%n;
>>>>       printf("Cancelling send %d.\n", slot);
>>>>       MPI_Cancel(&reqs[slot]);
>>>> #if 1
>>>>       MPI_Barrier(MPI_COMM_WORLD);
>>>> #endif
>>>>       int origin = (rank==0) ? (size-1) : (rank-1);
>>>>       for (int i=0; i<n; i++) {
>>>> MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
>>>>       }
>>>>       MPI_Status stats[2*n];
>>>>       MPI_Waitall(2*n,reqs,stats);
>>>>       for (int i=0; i<n; i++) {
>>>>           int flag;
>>>>           MPI_Test_cancelled(&(stats[i]),&flag);
>>>>           if (flag) {
>>>>               printf("Status %d indicates cancel was successful.\n", i);
>>>>           }
>>>>       }
>>>>       MPI_Finalize();
>>>>       return 0;
>>>> }
>>> _______________________________________________
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/devel
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel

Jeff Hammond
jeff.science at gmail.com

More information about the devel mailing list