[mpich-discuss] MPICH Usage Issue: Client process not exiting till server reaches MPI_Finalize() in MPI client server singleton INIT connection

Thu Jan 27 13:41:42 CST 2022

Hello,

I am asking this question in this forum as I had already asked this
question in stack overflow and did not receive any answer/comment.

I am trying to implement a feature that performs a large computation in
parallel over multiple machines. I am trying out zeroMQ as well as MPI for
this, and will accept the one which gives better performance and also is
better in ease of usage and support. Your help in this is much appreciated.

I am implementing a new feature, using MPI client server parallel
computation with 'Singleton INIT' mechanism, in an existing large
application on Linux. I am using mpich-3.4.1 for this. The main process
becomes an mpi server on certain user inputs. Client processes connect to
this server and share a large parallel computation between themselves and
return the result back to the server.  The clients and server processes are
independently started (Singleton INIT). All client processes should be able
to normally exit after disconnecting from the server, even though the
server is still running (doing some other work, but not accepting any more
client connections).

My questions is:

Is it possible to keep the server process running and executing other
tasks, while all of its client connections have closed and client processes
exited?

In my client-server MPI code, I see that for the last client connection to
a server (corresponding to the last MPI_Comm_accept() call) the client
process always gets stuck at its MPI_Finalize() till the server reaches its
own MPI_Finalize(). Thus the last client process does not exit till the
server reaches its MPI_Finalize() stage. However, for the previous client
connections the client processes are able to normally exit.

'lsof' command shows, at the server side, call to MPI_Comm_accept()
releases the file descriptor from the previous connection, which is not
getting released for the last connection due to the absence of subsequent
MPI_Comm_accept() call, but gets released at the MPI_Finalize() call.

What is it that I am missing or doing incorrectly? How to have the last
client also exit and not wait at MPI_Finalize() stage in the following
code, even though the server is still running?

Is there any way I can set a timeout to the MPI_Comm_accept? or interrupt
it from a separate thread?

My client and server code are as follows:

//server.cxx
int main() {
  char ch;
  MPI_Init(NULL, NULL);
  char portName[MPI_MAX_PORT_NAME];
  MPI_Open_port(MPI_INFO_NULL, portName);

  publishServerPortNameToFile(portName, /*args*/);

  MPI_Comm intercomm1;
  // First connection. Client corresponding to this connection
connects and exits successfully.
  MPI_Comm_accept(portName, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&intercomm1);

  std::cout << "\nConnection1 Accepted.";

  MPI_Comm intercomm2;
  // Second connection. Client corresponding to this connection
connects, but waits at its
  // own MPI_Finalize() stage, and exits only after server reaches
MPI_finalize().
  MPI_Comm_accept(portName, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&intercomm2);

  std::cout << "\nConnection 2 Accepted.";
  std::cout << "\nBefore disconnect.";

  MPI_Close_port(portName);
  MPI_Comm_disconnect(&intercomm1);
  MPI_Comm_disconnect(&intercomm2);

  std::cout << "\nBefore Finalize().";
  MPI_Finalize();
  std::cout << "\nAfter Finalize.";
}

//client.cxx
int main() {
  MPI_Init(NULL, NULL);

  std::string portName = getServerPortNameFromFile(/*args*/);

  MPI_Comm intercomm;
  MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF,
&intercomm);
  std::cout << "\nConnected";

  MPI_Comm_disconnect(&intercomm);

  std::cout << "\nBefore Finalize()";
  MPI_Finalize();
  std::cout << "\nAfter Finalize().\nClient Exited";
}

Output after first client connection is made, and the client exits:

shell1:
$ ./server
Connection 1 Accepted
< server waiting to accept next connection >

shell2:
$ ./client          // first client requesting connection
Connected
Before Finalize()
After Finalize()
Client Exited
$

Output after the second client requests connection (at this point the
first client had successfully connected and exited):

shell1:
$ ./server
Connection 1 Accepted
Connection 2 Accepted.
Before disconnect. Press any key   // key pressed
Before Finalize(). Press any key   // key not yet pressed

shell2:
$ ./client   // this is the second client connection to the same server

Connected
Before Finalize().
< this client process waits here till the server reaches its Finalize() >

Regards,

Rupsa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220128/eab263e8/attachment.html>