[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used

Jeff Hammond jeff.science at gmail.com
Thu Jun 4 12:19:32 CDT 2015


Thanks for pointing that out.  It runs correctly now.  Sorry for the
stupid question.

On Thu, Jun 4, 2015 at 11:49 AM, Halim Amer <aamer at anl.gov> wrote:
> Hi Jeff,
>
> I don't think it is a correct program. If the send is correctly canceled
> then the origin has to satisfy the destination with another send. The hang
> is an expected result.
>
> This is what the standard says (P102):
>
> "...or that the send is successfully cancelled, in which case no part of the
> message was received at the destination. Then, any matching receive has to
> be satisfied by another send."
>
> --Halim
>
> Abdelhalim Amer (Halim)
> Postdoctoral Appointee
> MCS Division
> Argonne National Laboratory
>
>
> On 6/4/15 9:21 AM, Jeff Hammond wrote:
>>
>> I can't tell for sure if this is a correct program, but multiple
>> members of the MPI Forum suggested it is.
>>
>> If it is a correct program, it appears to expose a bug in MPICH,
>> because the MPI_Waitall hangs.
>>
>> Thanks,
>>
>> Jeff
>>
>> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>>
>> $ mpichversion
>> MPICH Version:    3.2b1
>> MPICH Release date: unreleased development copy
>> MPICH Device:    ch3:nemesis
>> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
>> F77=gfortran-4.9 --enable-cxx --enable-fortran
>> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
>> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
>> --enable-static --enable-shared
>> MPICH CC: gcc-4.9    -g -O2
>> MPICH CXX: g++-4.9   -g -O2
>> MPICH F77: gfortran-4.9   -g -O2
>> MPICH FC: gfortran-4.9   -g -O2
>>
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <mpi.h>
>>
>> const int n=1000;
>>
>> int main(void)
>> {
>>      MPI_Init(NULL,NULL);
>>
>>      int size, rank;
>>      MPI_Comm_size(MPI_COMM_WORLD, &size);
>>      MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>      if (size<2) {
>>          printf("You must use 2 or more processes!\n");
>>          MPI_Finalize();
>>          exit(1);
>>      }
>>
>>      MPI_Request reqs[2*n];
>>
>>      int target = (rank+1)%size;
>>      for (int i=0; i<n; i++) {
>>          MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
>>      }
>>
>>      srand((unsigned)(rank+MPI_Wtime()));
>>      int slot = rand()%n;
>>      printf("Cancelling send %d.\n", slot);
>>      MPI_Cancel(&reqs[slot]);
>>
>> #if 1
>>      MPI_Barrier(MPI_COMM_WORLD);
>> #endif
>>
>>      int origin = (rank==0) ? (size-1) : (rank-1);
>>      for (int i=0; i<n; i++) {
>>          MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
>>      }
>>
>>      MPI_Status stats[2*n];
>>      MPI_Waitall(2*n,reqs,stats);
>>
>>      for (int i=0; i<n; i++) {
>>          int flag;
>>          MPI_Test_cancelled(&(stats[i]),&flag);
>>          if (flag) {
>>              printf("Status %d indicates cancel was successful.\n", i);
>>          }
>>      }
>>
>>      MPI_Finalize();
>>      return 0;
>> }
>>
>>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/


More information about the devel mailing list