[mpich-discuss] MPI_Comm_split and end with Finalize

Balaji, Pavan balaji at anl.gov
Sat Feb 27 10:59:11 CST 2016


It's unclear what exactly you are trying to do here.  Why are the clients connecting to the server and then immediately "splitting off"?

Your "split-off" functionality needs to be implemented using MPI_Comm_disconnect, not using MPI_Comm_split.  Comm_split divides a communicator into smaller communicators, but all processes are still very much connected.  So as long as the server process is connected to the client processes, it might still receive messages from the client process and thus cannot simply exit.  Comm_disconnect, on the other hand, disconnects the client processes from the server processes.

But then again, I have no idea why you are connecting to the server and disconnecting immediately.

  -- Pavan

> On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran <knram06 at gmail.com> wrote:
> 
> Hello all,
> 
> I have recently begun working on a project that uses MPICH-3.2 and I am trying to resolve an issue where a server process busy waits at MPI_Finalize.
> 
> We are trying to create a server process that accepts incoming connections from a known number of clients (say, N clients), forms a new communicator amongst everyone (server and clients) and then splits itself from the group and terminates, so that the clients now only work with each other.
> 
> For very problem specific reasons, we cannot do
> 'mpiexec -np N (other args)'
> 
> So we have a server that publishes a service name to a nameserver and clients lookup the name to join the server. The server and client processes are started with separate calls to mpiexec, one to start the server and the rest N calls to start the clients.
> 
> The server process busy-waits at the MPI_Finalize call, after it splits from the communicator and only finishes when all other clients reach their MPI_Finalize too. 
> 
> Consider a simplified case of only one server and one client. The simplified pseudocode is:
> 
> Server process:
> MPI_Init();
> MPI_Open_port(...);
> MPI_Publish_name(...); //publish service name to nameserver
> 
> MPI_accept(...); // accept incoming connections and store into intercomm
> MPI_Intercomm_merge(...);  // merge new client into intra-comm
> 
> // now split the server from the client
> MPI_Comm_rank(intra comm, rank); // rank=0
> MPI_Comm_split(intra comm, (rank==0), rank, lone comm);
> 
> MPI_Finalize(); // busy-waits here till client's sleep duration
> 
> Client process: (simplified - assuming only one client is trying to connect)
> MPI_Init();
> MPI_Lookup_name(..);
> MPI_Connect(...)
> 
> // merge
> MPI_Intercomm_merge(...); // merge with server
> 
> // get rank and split
> MPI_Comm_rank(intra comm, rank);  // rank=1
> MPI_Comm_split(intra comm, rank==0, rank, lone comm);
> 
> sleep(10); // sleep for 10 seconds - causes server to busy wait at MPI_Finalize for sleep duration
> 
> MPI_Finalize(); // server and client finish here
> 
> So my questions are:
> 
> 1) Is busy-wait at MPI_Finalize the expected behaviour?
> 
> 2) How to truly "disconnect" the server, so that it can end immediately at MPI_Finalize()? I had tried MPI_Comm_disconnect (also MPI_Comm_free) on both the server and client, but that didn't help.
> 
> 3)  We don't want to see the server process consuming one core at 100% while it waits at MPI_Finalize. Are other alternatives apart from making the server process sleep, wakeup and keep polling a client, and then finally call MPI_Finalize?
> 
> Thank you for any inputs that you can give here.
> 
> 
> Regards,
> K.N.Ramachandran
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list