[mpich-discuss] How to gracefully exit an MPI server process when it is waiting on MPI_Comm_accept

Rupsa Chakraborty c.rupsa at gmail.com
Sun Jan 30 03:15:53 CST 2022


Hi Hui,

Thank you for the clarification.

Regards,
Rupsa

On Sun, Jan 30, 2022 at 11:08 AM Zhou, Hui <zhouh at anl.gov> wrote:

> Hi Rupsa,
>
> The error message could be better. By  "Invalid port", it actually meant
> the connection was timed out. So your 2-second time out actually worked.
> You should use MPI_Comm_set_errhandler​ to set error handler to
> MPI_ERRORS_RETURN​, so that your application can check error and respond
> accordingly. The default error handler is to abort.
>
> As I checked, the current code actually will leak resources when timeout
> happens. I just created this issue --
> https://github.com/pmodels/mpich/issues/5815 -- to track it.
>
> --
> Hui Zhou
> <https://github.com/pmodels/mpich/issues/5815>
> ch4/ofi: better error handling in MPI_Comm_{accept,connect} · Issue #5815
> · pmodels/mpich <https://github.com/pmodels/mpich/issues/5815>
> In MPIDI_OFI_dynamic_send and MPIDI_OFI_dynamic_recv, when error happens,
> we should try cancel the send or recv. Otherwise, we may leave or
> accumulated unclosed handle in libfabric. Separately, whe...
> github.com
>
> ------------------------------
> *From:* Rupsa Chakraborty <c.rupsa at gmail.com>
> *Sent:* Saturday, January 29, 2022 1:29 AM
> *To:* Zhou, Hui <zhouh at anl.gov>; protze at itc.rwth-aachen.de <
> protze at itc.rwth-aachen.de>
> *Cc:* discuss at mpich.org <discuss at mpich.org>
> *Subject:* Re: [mpich-discuss] How to gracefully exit an MPI server
> process when it is waiting on MPI_Comm_accept
>
> Hi Hui, Joachim,
>
> Thanks for the information.
> I am not able to find in the internet a clear example as to how to
> populate MPI_Info and use it with MPI_Comm_accept. I tried to do the
> following, but this is giving errors as shown below.
> Could you provide a small example to set up a timeout with
> MPI_Comm_accept? Do I need to populate port info too with MPI_Info?
>
>             MPI_Init(NULL, NULL);
>             char portName[MPI_MAX_PORT_NAME];
>             MPI_Open_port(MPI_INFO_NULL, portName);
>
>             MPI_Info info;
>             MPI_Info_create( &info );
>             MPI_Info_set(info, "timeout", "2");
>
>             MPI_Comm intercomm;
>             MPI_Comm_accept(portName, info, 0, MPI_COMM_SELF,
> &intercomm);
>
> Error:
> $ ./server
> Waiting to accept....
> Abort(671693606) on node 0 (rank 0 in comm 0): Fatal error in
> internal_Comm_accept: Invalid port, error stack:
> internal_Comm_accept(102)....:
> MPI_Comm_accept(port_name=tag#0$connentry#020098ADAC1785B70000000000000000$,
> info=0x9c000002, 0, MPI_COMM_SELF, newcomm=0x7f79b17dbcb8) failed
> MPID_Comm_accept(442)........:
> dynamic_intercomm_create(400):
> peer_intercomm_create(334)...:
> (unknown)(): Invalid port
>
> Regards,
> Rupsa
>
> On Fri, Jan 28, 2022 at 8:47 PM Joachim Protze <protze at itc.rwth-aachen.de>
> wrote:
>
> Hi,
>
> since you have a separate thread for the accept, you could just set a
> flag at the main thread, open a connection from the main thread to match
> the accept. The accept thread sees the flag, immediately closes the
> connection and leaves the while loop.
>
>   - Joachim
>
> Am 28.01.22 um 05:54 schrieb Rupsa Chakraborty via discuss:
> >
> > Hello,
> >
> > I am asking this question in this forum as I had already asked this
> > question in stack overflow and did not receive any answer/comment:
> >
> https://stackoverflow.com/questions/70745740/how-to-gracefully-exit-an-mpi-server-process-when-it-is-waiting-on-mpi-comm-acce
> > <
> https://stackoverflow.com/questions/70745740/how-to-gracefully-exit-an-mpi-server-process-when-it-is-waiting-on-mpi-comm-acce
> >
> >
> > My concern is as follows:
> > I have an MPI server program that calls MPI_Comm_accept, in an infinite
> > loop, on a separate thread. The main thread spawns this thread and in
> > parallel does some other work. At some point of time the main thread
> > decides to exit. It closes the port using MPI_Close_port(portName),
> > however the MPI_Comm_accept() is still waiting for connection requests.
> >
> >
> > My question is:
> > - How does the main thread exit gracefully? Is there an equivalent of
> > socket close in MPI? I am using mpich which I can use from main thread?
> >
> > - Is there anyway I can set timeout to the MPI_Comm_accept call?
> >
> > Here is a dummy code that looks similar to my server code:
> >
> > |// MPI Server Program void accept(std::string portName) { while (true)
> > { MPI_Comm intercomm; MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL,
> > 0, MPI_COMM_SELF, &intercomm); // handle connection
> > MPI_Comm_disconnect(&intercomm); } } int main() { char ch;
> > MPI_Init(NULL, NULL); char portName[MPI_MAX_PORT_NAME];
> > MPI_Open_port(MPI_INFO_NULL, portName);
> > publishName("nameServerFile.txt", "ocean", std::string(portName));
> > std::thread th(&accept, std::string(portName)); // do something
> > MPI_Close_port(portName); std::cout <<"\nClosed port" << std::endl;
> > th.join(); // main thread waits infinitely at this join // as child
> > thread is still waiting on MPI_Comm_accept MPI_Finalize(); }|
> >
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
> --
> Dr. rer. nat. Joachim Protze
>
> IT Center
> Group: High Performance Computing
> Division: Computational Science and Engineering
> RWTH Aachen University
> Seffenter Weg 23
> D 52074  Aachen (Germany)
> Tel: +49 241 80- 24765
> Fax: +49 241 80-624765
> protze at itc.rwth-aachen.de
> www.itc.rwth-aachen.de
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220130/48215fda/attachment-0001.html>


More information about the discuss mailing list