[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used

Rob Latham robl at mcs.anl.gov
Thu Jun 4 13:00:41 CDT 2015



On 06/04/2015 12:19 PM, Jeff Hammond wrote:
> Thanks for pointing that out.  It runs correctly now.  Sorry for the
> stupid question.

  it just so happens, Jeff, that they've been spending a lot of time 
debugging cancel send operations for all our various devices and so 
"cancel semantics" are (moreso than usual) quite warm in the cache.

==rob

> On Thu, Jun 4, 2015 at 11:49 AM, Halim Amer <aamer at anl.gov> wrote:
>> Hi Jeff,
>>
>> I don't think it is a correct program. If the send is correctly canceled
>> then the origin has to satisfy the destination with another send. The hang
>> is an expected result.
>>
>> This is what the standard says (P102):
>>
>> "...or that the send is successfully cancelled, in which case no part of the
>> message was received at the destination. Then, any matching receive has to
>> be satisfied by another send."
>>
>> --Halim
>>
>> Abdelhalim Amer (Halim)
>> Postdoctoral Appointee
>> MCS Division
>> Argonne National Laboratory
>>
>>
>> On 6/4/15 9:21 AM, Jeff Hammond wrote:
>>>
>>> I can't tell for sure if this is a correct program, but multiple
>>> members of the MPI Forum suggested it is.
>>>
>>> If it is a correct program, it appears to expose a bug in MPICH,
>>> because the MPI_Waitall hangs.
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>>>
>>> $ mpichversion
>>> MPICH Version:    3.2b1
>>> MPICH Release date: unreleased development copy
>>> MPICH Device:    ch3:nemesis
>>> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
>>> F77=gfortran-4.9 --enable-cxx --enable-fortran
>>> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
>>> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
>>> --enable-static --enable-shared
>>> MPICH CC: gcc-4.9    -g -O2
>>> MPICH CXX: g++-4.9   -g -O2
>>> MPICH F77: gfortran-4.9   -g -O2
>>> MPICH FC: gfortran-4.9   -g -O2
>>>
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <mpi.h>
>>>
>>> const int n=1000;
>>>
>>> int main(void)
>>> {
>>>       MPI_Init(NULL,NULL);
>>>
>>>       int size, rank;
>>>       MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>       MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>       if (size<2) {
>>>           printf("You must use 2 or more processes!\n");
>>>           MPI_Finalize();
>>>           exit(1);
>>>       }
>>>
>>>       MPI_Request reqs[2*n];
>>>
>>>       int target = (rank+1)%size;
>>>       for (int i=0; i<n; i++) {
>>>           MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
>>>       }
>>>
>>>       srand((unsigned)(rank+MPI_Wtime()));
>>>       int slot = rand()%n;
>>>       printf("Cancelling send %d.\n", slot);
>>>       MPI_Cancel(&reqs[slot]);
>>>
>>> #if 1
>>>       MPI_Barrier(MPI_COMM_WORLD);
>>> #endif
>>>
>>>       int origin = (rank==0) ? (size-1) : (rank-1);
>>>       for (int i=0; i<n; i++) {
>>>           MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
>>>       }
>>>
>>>       MPI_Status stats[2*n];
>>>       MPI_Waitall(2*n,reqs,stats);
>>>
>>>       for (int i=0; i<n; i++) {
>>>           int flag;
>>>           MPI_Test_cancelled(&(stats[i]),&flag);
>>>           if (flag) {
>>>               printf("Status %d indicates cancel was successful.\n", i);
>>>           }
>>>       }
>>>
>>>       MPI_Finalize();
>>>       return 0;
>>> }
>>>
>>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>
>
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the devel mailing list