[mpich-discuss] How to gracefully exit an MPI server process when it is waiting on MPI_Comm_accept
Zhou, Hui
zhouh at anl.gov
Sat Jan 29 23:38:11 CST 2022
Hi Rupsa,
The error message could be better. By "Invalid port", it actually meant the connection was timed out. So your 2-second time out actually worked. You should use MPI_Comm_set_errhandler to set error handler to MPI_ERRORS_RETURN, so that your application can check error and respond accordingly. The default error handler is to abort.
As I checked, the current code actually will leak resources when timeout happens. I just created this issue -- https://github.com/pmodels/mpich/issues/5815 -- to track it.
--
Hui Zhou
[https://opengraph.githubassets.com/f9215523be8f812ac3716771553a3d691ec7f493ddeb590e541727c7eb0340eb/pmodels/mpich/issues/5815]<https://github.com/pmodels/mpich/issues/5815>
ch4/ofi: better error handling in MPI_Comm_{accept,connect} · Issue #5815 · pmodels/mpich<https://github.com/pmodels/mpich/issues/5815>
In MPIDI_OFI_dynamic_send and MPIDI_OFI_dynamic_recv, when error happens, we should try cancel the send or recv. Otherwise, we may leave or accumulated unclosed handle in libfabric. Separately, whe...
github.com
________________________________
From: Rupsa Chakraborty <c.rupsa at gmail.com>
Sent: Saturday, January 29, 2022 1:29 AM
To: Zhou, Hui <zhouh at anl.gov>; protze at itc.rwth-aachen.de <protze at itc.rwth-aachen.de>
Cc: discuss at mpich.org <discuss at mpich.org>
Subject: Re: [mpich-discuss] How to gracefully exit an MPI server process when it is waiting on MPI_Comm_accept
Hi Hui, Joachim,
Thanks for the information.
I am not able to find in the internet a clear example as to how to populate MPI_Info and use it with MPI_Comm_accept. I tried to do the following, but this is giving errors as shown below.
Could you provide a small example to set up a timeout with MPI_Comm_accept? Do I need to populate port info too with MPI_Info?
MPI_Init(NULL, NULL);
char portName[MPI_MAX_PORT_NAME];
MPI_Open_port(MPI_INFO_NULL, portName);
MPI_Info info;
MPI_Info_create( &info );
MPI_Info_set(info, "timeout", "2");
MPI_Comm intercomm;
MPI_Comm_accept(portName, info, 0, MPI_COMM_SELF, &intercomm);
Error:
$ ./server
Waiting to accept....
Abort(671693606) on node 0 (rank 0 in comm 0): Fatal error in internal_Comm_accept: Invalid port, error stack:
internal_Comm_accept(102)....: MPI_Comm_accept(port_name=tag#0$connentry#020098ADAC1785B70000000000000000$, info=0x9c000002, 0, MPI_COMM_SELF, newcomm=0x7f79b17dbcb8) failed
MPID_Comm_accept(442)........:
dynamic_intercomm_create(400):
peer_intercomm_create(334)...:
(unknown)(): Invalid port
Regards,
Rupsa
On Fri, Jan 28, 2022 at 8:47 PM Joachim Protze <protze at itc.rwth-aachen.de<mailto:protze at itc.rwth-aachen.de>> wrote:
Hi,
since you have a separate thread for the accept, you could just set a
flag at the main thread, open a connection from the main thread to match
the accept. The accept thread sees the flag, immediately closes the
connection and leaves the while loop.
- Joachim
Am 28.01.22 um 05:54 schrieb Rupsa Chakraborty via discuss:
>
> Hello,
>
> I am asking this question in this forum as I had already asked this
> question in stack overflow and did not receive any answer/comment:
> https://stackoverflow.com/questions/70745740/how-to-gracefully-exit-an-mpi-server-process-when-it-is-waiting-on-mpi-comm-acce
> <https://stackoverflow.com/questions/70745740/how-to-gracefully-exit-an-mpi-server-process-when-it-is-waiting-on-mpi-comm-acce>
>
> My concern is as follows:
> I have an MPI server program that calls MPI_Comm_accept, in an infinite
> loop, on a separate thread. The main thread spawns this thread and in
> parallel does some other work. At some point of time the main thread
> decides to exit. It closes the port using MPI_Close_port(portName),
> however the MPI_Comm_accept() is still waiting for connection requests.
>
>
> My question is:
> - How does the main thread exit gracefully? Is there an equivalent of
> socket close in MPI? I am using mpich which I can use from main thread?
>
> - Is there anyway I can set timeout to the MPI_Comm_accept call?
>
> Here is a dummy code that looks similar to my server code:
>
> |// MPI Server Program void accept(std::string portName) { while (true)
> { MPI_Comm intercomm; MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL,
> 0, MPI_COMM_SELF, &intercomm); // handle connection
> MPI_Comm_disconnect(&intercomm); } } int main() { char ch;
> MPI_Init(NULL, NULL); char portName[MPI_MAX_PORT_NAME];
> MPI_Open_port(MPI_INFO_NULL, portName);
> publishName("nameServerFile.txt", "ocean", std::string(portName));
> std::thread th(&accept, std::string(portName)); // do something
> MPI_Close_port(portName); std::cout <<"\nClosed port" << std::endl;
> th.join(); // main thread waits infinitely at this join // as child
> thread is still waiting on MPI_Comm_accept MPI_Finalize(); }|
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Dr. rer. nat. Joachim Protze
IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074 Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de<mailto:protze at itc.rwth-aachen.de>
www.itc.rwth-aachen.de<http://www.itc.rwth-aachen.de>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220130/75ee51b7/attachment.html>
More information about the discuss
mailing list