[mpich-discuss] MPICH Usage Issue: Client process not exiting till server reaches MPI_Finalize() in MPI client server singleton INIT connection

Zhou, Hui zhouh at anl.gov
Thu Jan 27 17:19:40 CST 2022


Hi Rupsa,

Could you try the latest release, mpich-4.0, for your application? This waiting for server finalization behavior may have been fixed.

--
Hui Zhou


From: Rupsa Chakraborty via discuss <discuss at mpich.org>
Date: Thursday, January 27, 2022 at 1:42 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Rupsa Chakraborty <c.rupsa at gmail.com>
Subject: [mpich-discuss] MPICH Usage Issue: Client process not exiting till server reaches MPI_Finalize() in MPI client server singleton INIT connection

Hello,

I am asking this question in this forum as I had already asked this question in stack overflow and did not receive any answer/comment.

I am trying to implement a feature that performs a large computation in parallel over multiple machines. I am trying out zeroMQ as well as MPI for this, and will accept the one which gives better performance and also is better in ease of usage and support. Your help in this is much appreciated.

I am implementing a new feature, using MPI client server parallel computation with 'Singleton INIT' mechanism, in an existing large application on Linux. I am using mpich-3.4.1 for this. The main process becomes an mpi server on certain user inputs. Client processes connect to this server and share a large parallel computation between themselves and return the result back to the server.  The clients and server processes are independently started (Singleton INIT). All client processes should be able to normally exit after disconnecting from the server, even though the server is still running (doing some other work, but not accepting any more client connections).

My questions is:

Is it possible to keep the server process running and executing other tasks, while all of its client connections have closed and client processes exited?

In my client-server MPI code, I see that for the last client connection to a server (corresponding to the last MPI_Comm_accept() call) the client process always gets stuck at its MPI_Finalize() till the server reaches its own MPI_Finalize(). Thus the last client process does not exit till the server reaches its MPI_Finalize() stage. However, for the previous client connections the client processes are able to normally exit.

'lsof' command shows, at the server side, call to MPI_Comm_accept() releases the file descriptor from the previous connection, which is not getting released for the last connection due to the absence of subsequent MPI_Comm_accept() call, but gets released at the MPI_Finalize() call.

What is it that I am missing or doing incorrectly? How to have the last client also exit and not wait at MPI_Finalize() stage in the following code, even though the server is still running?

Is there any way I can set a timeout to the MPI_Comm_accept? or interrupt it from a separate thread?

My client and server code are as follows:

//server.cxx

int main() {

  char ch;

  MPI_Init(NULL, NULL);

  char portName[MPI_MAX_PORT_NAME];

  MPI_Open_port(MPI_INFO_NULL, portName);



  publishServerPortNameToFile(portName, /*args*/);



  MPI_Comm intercomm1;

  // First connection. Client corresponding to this connection connects and exits successfully.

  MPI_Comm_accept(portName, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm1);



  std::cout << "\nConnection1 Accepted.";



  MPI_Comm intercomm2;

  // Second connection. Client corresponding to this connection connects, but waits at its

  // own MPI_Finalize() stage, and exits only after server reaches MPI_finalize().

  MPI_Comm_accept(portName, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm2);



  std::cout << "\nConnection 2 Accepted.";

  std::cout << "\nBefore disconnect.";



  MPI_Close_port(portName);

  MPI_Comm_disconnect(&intercomm1);

  MPI_Comm_disconnect(&intercomm2);



  std::cout << "\nBefore Finalize().";

  MPI_Finalize();

  std::cout << "\nAfter Finalize.";

}



//client.cxx

int main() {

  MPI_Init(NULL, NULL);



  std::string portName = getServerPortNameFromFile(/*args*/);



  MPI_Comm intercomm;

  MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm);

  std::cout << "\nConnected";



  MPI_Comm_disconnect(&intercomm);



  std::cout << "\nBefore Finalize()";

  MPI_Finalize();

  std::cout << "\nAfter Finalize().\nClient Exited";

}







Output after first client connection is made, and the client exits:



shell1:

$ ./server

Connection 1 Accepted

< server waiting to accept next connection >



shell2:

$ ./client          // first client requesting connection

Connected

Before Finalize()

After Finalize()

Client Exited

$



Output after the second client requests connection (at this point the first client had successfully connected and exited):



shell1:

$ ./server

Connection 1 Accepted

Connection 2 Accepted.

Before disconnect. Press any key   // key pressed

Before Finalize(). Press any key   // key not yet pressed



shell2:

$ ./client   // this is the second client connection to the same server



Connected

Before Finalize().

< this client process waits here till the server reaches its Finalize() >





Regards,

Rupsa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220127/30af5ff0/attachment-0001.html>


More information about the discuss mailing list