[mpich-discuss] Hang in MPI_Win_Flush during scenario in one-sided communication

Neil Spruit nrspruit at gmail.com
Mon Sep 22 16:53:37 CDT 2014


Hello,



I have been experimenting with mpich’s implementation of one-sided
communication with no remote synchronization that was introduced in the
mpi3.0 spec and I have run into a hang issue during connect/accept MPI
scenarios.



Here is the situation,  I have two applications that connect to each other
after they have been launched separately with mpiexec on two different
hosts.



After connecting, both hosts allocate memory and run MPI_Win_Create as the
following:

Host1_program:

   void* buffer = mmap(0, buf_length, PROT_READ | PROT_WRITE,
MAP_SHARED|MAP_POPULATE|MAP_ANON, -1, 0);

    MPI_Intercomm_merge(connected_comm, 1, &comm);

    MPI_Comm_rank(comm, &myrank);

    MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, comm, &win);

Host2_program:

   void* buffer = mmap(0, buf_length, PROT_READ | PROT_WRITE,
MAP_SHARED|MAP_POPULATE|MAP_ANON, -1, 0);

    MPI_Intercomm_merge(connected_comm, 1, &comm);

    MPI_Comm_rank(comm, &myrank);

    MPI_Win_create(buffer, buf_length, 1, MPI_INFO_NULL, comm, &win);





Once the MPI_Window has been created I execute the following on host1:

 Host1_program(continued):

       err = MPI_Win_lock(MPI_LOCK_SHARED, target_rank, 0, win);

        if(err != MPI_SUCCESS)

        {

            return 1;

        }

        err = MPI_Put(buffer, buf_length, MPI_BYTE, target_rank, 0,
buf_length, MPI_BYTE, win);

        if(err != MPI_SUCCESS)

        {

            return 1;

        }

        err = MPI_Win_flush(target_rank, win);

        if(err != MPI_SUCCESS)

        {

            return 1;

        }

        err = MPI_Win_unlock(target_rank, win);

        if(err != MPI_SUCCESS)

        {

            return 1;

        }

While this is running, host2’s program is simply sleeping in a getchar().



This should run and complete, however, if the other host has not called
another mpi function such as MPI_Recv and instead is in a sleep, getchar(),
etc. the MPI_Win_flush above will simply hang.



Why is the one-sided operation hanging until another MPI operation is
called on the target host? My understanding is that utilizing MPI_Win_Flush
does not require any action by the target host to complete the operation.



This is a stack trace of where the flush gets stuck:

#0  0x0000003b456dea28 in poll () from /lib64/libc.so.6

#1  0x00007fe1c08004ee in MPID_nem_tcp_connpoll ()

   from /usr/local/lib/libmpich.so.12

#2  0x00007fe1c07efab7 in MPIDI_CH3I_Progress ()

   from /usr/local/lib/libmpich.so.12

#3  0x00007fe1c07d35ff in MPIDI_CH3I_Wait_for_lock_granted ()

   from /usr/local/lib/libmpich.so.12

#4  0x00007fe1c07d88bf in MPIDI_Win_flush () from
/usr/local/lib/libmpich.so.12

#5  0x00007fe1c08c1b46 in PMPI_Win_flush () from
/usr/local/lib/libmpich.so.12

#6  0x000000000040115c in main ()



According to this it seems that somehow the remote host is holding
something after MPI_Win_Create that in my scenario gets unlocked if I call
MPI_Recv on the connected_comm from above after the MPI_Win_Create.



Thank you for your time and I look forward to your reply.



Respectfully,

Neil Spruit


P.S. MPICH version info

    Version:                                 3.1

    Release Date:                        Thu Feb 20 11:41:13 CST 2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140922/3233d764/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list