[mpich-discuss] MPI_Comm_split and end with Finalize

Halim Amer aamer at anl.gov
Mon Feb 29 16:39:15 CST 2016


MPI_Finalize is collective over a set of connected processes. If the 
server hangs in MPI_Finalize, it means that it is still connected to a 
subset of the clients. It is difficult to know the reason without a 
concrete piece of code. If you send us a minimal example that reproduces 
the problem, we might be able to identify the issue.

--Halim

www.mcs.anl.gov/~aamer

On 2/29/16 9:00 AM, K. N. Ramachandran wrote:
> Hello all,
>
> I had tried just calling MPI_Comm_disconnect instead of MPI_Comm_split.
>
> I had tried this on just the server side, as well as both on the server
> and client, but I still see the issue of busy-wait at MPI_Finalize on
> the server side. Can anyone give any further inputs on this?
>
> It looks like the server process should be able to terminate early, but
> is held up by the client, even though they should be disconnected from
> each other.
>
>
>
> On Sat, Feb 27, 2016 at 7:00 PM, K. N. Ramachandran <knram06 at gmail.com
> <mailto:knram06 at gmail.com>> wrote:
>
>     Hi Pavan,
>
>     Thank you for the reply. I have presented only a very simplified
>     case of one server and one client and that is why the problem looks
>     strange.
>
>     The general case is one server acting as a meeting point and N
>     clients join the server and one intra comm is formed among them all.
>     Then the server splits off and terminates, leaving the intracomm and
>     then letting the clients work amongst themselves now.
>
>     I had also tried MPI_Comm_disconnect on the server, after calling
>     MPI_Comm_split, but even in that case, the server busy-waits for the
>     client at Finalize. The single server and single client was only to
>     demonstrate the problem I am facing.
>
>     Please let me know if you might need more information. Thanks.
>
>     On Sat, Feb 27, 2016 at 11:59 AM, Balaji, Pavan <balaji at anl.gov
>     <mailto:balaji at anl.gov>> wrote:
>
>
>         It's unclear what exactly you are trying to do here.  Why are
>         the clients connecting to the server and then immediately
>         "splitting off"?
>
>         Your "split-off" functionality needs to be implemented using
>         MPI_Comm_disconnect, not using MPI_Comm_split.  Comm_split
>         divides a communicator into smaller communicators, but all
>         processes are still very much connected.  So as long as the
>         server process is connected to the client processes, it might
>         still receive messages from the client process and thus cannot
>         simply exit.  Comm_disconnect, on the other hand, disconnects
>         the client processes from the server processes.
>
>         But then again, I have no idea why you are connecting to the
>         server and disconnecting immediately.
>
>            -- Pavan
>
>          > On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran
>         <knram06 at gmail.com <mailto:knram06 at gmail.com>> wrote:
>          >
>          > Hello all,
>          >
>          > I have recently begun working on a project that uses
>         MPICH-3.2 and I am trying to resolve an issue where a server
>         process busy waits at MPI_Finalize.
>          >
>          > We are trying to create a server process that accepts
>         incoming connections from a known number of clients (say, N
>         clients), forms a new communicator amongst everyone (server and
>         clients) and then splits itself from the group and terminates,
>         so that the clients now only work with each other.
>          >
>          > For very problem specific reasons, we cannot do
>          > 'mpiexec -np N (other args)'
>          >
>          > So we have a server that publishes a service name to a
>         nameserver and clients lookup the name to join the server. The
>         server and client processes are started with separate calls to
>         mpiexec, one to start the server and the rest N calls to start
>         the clients.
>          >
>          > The server process busy-waits at the MPI_Finalize call, after
>         it splits from the communicator and only finishes when all other
>         clients reach their MPI_Finalize too.
>          >
>          > Consider a simplified case of only one server and one client.
>         The simplified pseudocode is:
>          >
>          > Server process:
>          > MPI_Init();
>          > MPI_Open_port(...);
>          > MPI_Publish_name(...); //publish service name to nameserver
>          >
>          > MPI_accept(...); // accept incoming connections and store
>         into intercomm
>          > MPI_Intercomm_merge(...);  // merge new client into intra-comm
>          >
>          > // now split the server from the client
>          > MPI_Comm_rank(intra comm, rank); // rank=0
>          > MPI_Comm_split(intra comm, (rank==0), rank, lone comm);
>          >
>          > MPI_Finalize(); // busy-waits here till client's sleep duration
>          >
>          > Client process: (simplified - assuming only one client is
>         trying to connect)
>          > MPI_Init();
>          > MPI_Lookup_name(..);
>          > MPI_Connect(...)
>          >
>          > // merge
>          > MPI_Intercomm_merge(...); // merge with server
>          >
>          > // get rank and split
>          > MPI_Comm_rank(intra comm, rank);  // rank=1
>          > MPI_Comm_split(intra comm, rank==0, rank, lone comm);
>          >
>          > sleep(10); // sleep for 10 seconds - causes server to busy
>         wait at MPI_Finalize for sleep duration
>          >
>          > MPI_Finalize(); // server and client finish here
>          >
>          > So my questions are:
>          >
>          > 1) Is busy-wait at MPI_Finalize the expected behaviour?
>          >
>          > 2) How to truly "disconnect" the server, so that it can end
>         immediately at MPI_Finalize()? I had tried MPI_Comm_disconnect
>         (also MPI_Comm_free) on both the server and client, but that
>         didn't help.
>          >
>          > 3)  We don't want to see the server process consuming one
>         core at 100% while it waits at MPI_Finalize. Are other
>         alternatives apart from making the server process sleep, wakeup
>         and keep polling a client, and then finally call MPI_Finalize?
>          >
>          > Thank you for any inputs that you can give here.
>          >
>          >
>          > Regards,
>          > K.N.Ramachandran
>          > _______________________________________________
>          > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>          > To manage subscription options or unsubscribe:
>          > https://lists.mpich.org/mailman/listinfo/discuss
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>     Regards,
>     K.N.Ramachandran
>
>
>
>
> Regards,
> K.N.Ramachandran
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list