[mpich-devel] MPICH hangs in MPI_Waitall when MPI_Cancel is used
Rob Latham
robl at mcs.anl.gov
Thu Jun 4 13:00:41 CDT 2015
On 06/04/2015 12:19 PM, Jeff Hammond wrote:
> Thanks for pointing that out. It runs correctly now. Sorry for the
> stupid question.
it just so happens, Jeff, that they've been spending a lot of time
debugging cancel send operations for all our various devices and so
"cancel semantics" are (moreso than usual) quite warm in the cache.
==rob
> On Thu, Jun 4, 2015 at 11:49 AM, Halim Amer <aamer at anl.gov> wrote:
>> Hi Jeff,
>>
>> I don't think it is a correct program. If the send is correctly canceled
>> then the origin has to satisfy the destination with another send. The hang
>> is an expected result.
>>
>> This is what the standard says (P102):
>>
>> "...or that the send is successfully cancelled, in which case no part of the
>> message was received at the destination. Then, any matching receive has to
>> be satisfied by another send."
>>
>> --Halim
>>
>> Abdelhalim Amer (Halim)
>> Postdoctoral Appointee
>> MCS Division
>> Argonne National Laboratory
>>
>>
>> On 6/4/15 9:21 AM, Jeff Hammond wrote:
>>>
>>> I can't tell for sure if this is a correct program, but multiple
>>> members of the MPI Forum suggested it is.
>>>
>>> If it is a correct program, it appears to expose a bug in MPICH,
>>> because the MPI_Waitall hangs.
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> $ mpicc -g -Wall -std=c99 cancel-sucks.c && mpiexec -n 4 ./a.out
>>>
>>> $ mpichversion
>>> MPICH Version: 3.2b1
>>> MPICH Release date: unreleased development copy
>>> MPICH Device: ch3:nemesis
>>> MPICH configure: CC=gcc-4.9 CXX=g++-4.9 FC=gfortran-4.9
>>> F77=gfortran-4.9 --enable-cxx --enable-fortran
>>> --enable-threads=runtime --enable-g=dbg --with-pm=hydra
>>> --prefix=/opt/mpich/dev/gcc/default --enable-wrapper-rpath
>>> --enable-static --enable-shared
>>> MPICH CC: gcc-4.9 -g -O2
>>> MPICH CXX: g++-4.9 -g -O2
>>> MPICH F77: gfortran-4.9 -g -O2
>>> MPICH FC: gfortran-4.9 -g -O2
>>>
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <mpi.h>
>>>
>>> const int n=1000;
>>>
>>> int main(void)
>>> {
>>> MPI_Init(NULL,NULL);
>>>
>>> int size, rank;
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> if (size<2) {
>>> printf("You must use 2 or more processes!\n");
>>> MPI_Finalize();
>>> exit(1);
>>> }
>>>
>>> MPI_Request reqs[2*n];
>>>
>>> int target = (rank+1)%size;
>>> for (int i=0; i<n; i++) {
>>> MPI_Issend(NULL,0,MPI_BYTE,target,0,MPI_COMM_WORLD,&(reqs[i]));
>>> }
>>>
>>> srand((unsigned)(rank+MPI_Wtime()));
>>> int slot = rand()%n;
>>> printf("Cancelling send %d.\n", slot);
>>> MPI_Cancel(&reqs[slot]);
>>>
>>> #if 1
>>> MPI_Barrier(MPI_COMM_WORLD);
>>> #endif
>>>
>>> int origin = (rank==0) ? (size-1) : (rank-1);
>>> for (int i=0; i<n; i++) {
>>> MPI_Irecv(NULL,0,MPI_BYTE,origin,0,MPI_COMM_WORLD,&(reqs[n+i]));
>>> }
>>>
>>> MPI_Status stats[2*n];
>>> MPI_Waitall(2*n,reqs,stats);
>>>
>>> for (int i=0; i<n; i++) {
>>> int flag;
>>> MPI_Test_cancelled(&(stats[i]),&flag);
>>> if (flag) {
>>> printf("Status %d indicates cancel was successful.\n", i);
>>> }
>>> }
>>>
>>> MPI_Finalize();
>>> return 0;
>>> }
>>>
>>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>
>
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the devel
mailing list