[mpich-discuss] MPI_Comm_split and end with Finalize

Oden, Lena loden at anl.gov
Mon Feb 29 17:21:30 CST 2016


I would say yes,.

One question about your first example: did you call MPI_Comm_disconnect for both,  the inter-com and the intra-com?


On Feb 29, 2016, at 5:07 PM, K. N. Ramachandran <knram06 at gmail.com<mailto:knram06 at gmail.com>> wrote:

Does rank 0 stop here since ranks 0 and 1 are still connected through Comm_World? If so, then the code attached before does not explicitly demonstrate the problem, since I can't disconnect from Comm_World.

On Mon, Feb 29, 2016 at 6:04 PM, K. N. Ramachandran <knram06 at gmail.com<mailto:knram06 at gmail.com>> wrote:
Hi Halim,

Please find attached a minimal example that seems to reproduce this. This is a more simplified version that does not do nameserver lookup.

I compiled this as
mpicxx -g test_two_procs.c -L /opt/mpich-3.2_install/lib/ -o two_procs

Run this with
mpiexec -np 2 ./two_procs

We can see that rank 0 waits at Finalize, until rank 1 reaches it, even though both rank 0 and rank 1 call MPI_Comm_disconnect. It is my understanding that rank 0 should have finished without having to wait for rank 1.

Also MPI_Comm_disconnect cannot be called on MPI_Comm_world.

I have another minimal example, that involves nameserver lookup but I think this example should demonstrate it for now. Hope this helps.


On Mon, Feb 29, 2016 at 5:39 PM, Halim Amer <aamer at anl.gov<mailto:aamer at anl.gov>> wrote:
MPI_Finalize is collective over a set of connected processes. If the server hangs in MPI_Finalize, it means that it is still connected to a subset of the clients. It is difficult to know the reason without a concrete piece of code. If you send us a minimal example that reproduces the problem, we might be able to identify the issue.

--Halim

www.mcs.anl.gov/~aamer<http://www.mcs.anl.gov/~aamer>

On 2/29/16 9:00 AM, K. N. Ramachandran wrote:
Hello all,

I had tried just calling MPI_Comm_disconnect instead of MPI_Comm_split.

I had tried this on just the server side, as well as both on the server
and client, but I still see the issue of busy-wait at MPI_Finalize on
the server side. Can anyone give any further inputs on this?

It looks like the server process should be able to terminate early, but
is held up by the client, even though they should be disconnected from
each other.



On Sat, Feb 27, 2016 at 7:00 PM, K. N. Ramachandran <knram06 at gmail.com<mailto:knram06 at gmail.com>
<mailto:knram06 at gmail.com<mailto:knram06 at gmail.com>>> wrote:

    Hi Pavan,

    Thank you for the reply. I have presented only a very simplified
    case of one server and one client and that is why the problem looks
    strange.

    The general case is one server acting as a meeting point and N
    clients join the server and one intra comm is formed among them all.
    Then the server splits off and terminates, leaving the intracomm and
    then letting the clients work amongst themselves now.

    I had also tried MPI_Comm_disconnect on the server, after calling
    MPI_Comm_split, but even in that case, the server busy-waits for the
    client at Finalize. The single server and single client was only to
    demonstrate the problem I am facing.

    Please let me know if you might need more information. Thanks.

    On Sat, Feb 27, 2016 at 11:59 AM, Balaji, Pavan <balaji at anl.gov<mailto:balaji at anl.gov>
    <mailto:balaji at anl.gov<mailto:balaji at anl.gov>>> wrote:


        It's unclear what exactly you are trying to do here.  Why are
        the clients connecting to the server and then immediately
        "splitting off"?

        Your "split-off" functionality needs to be implemented using
        MPI_Comm_disconnect, not using MPI_Comm_split.  Comm_split
        divides a communicator into smaller communicators, but all
        processes are still very much connected.  So as long as the
        server process is connected to the client processes, it might
        still receive messages from the client process and thus cannot
        simply exit.  Comm_disconnect, on the other hand, disconnects
        the client processes from the server processes.

        But then again, I have no idea why you are connecting to the
        server and disconnecting immediately.

           -- Pavan

         > On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran
        <knram06 at gmail.com<mailto:knram06 at gmail.com> <mailto:knram06 at gmail.com<mailto:knram06 at gmail.com>>> wrote:
         >
         > Hello all,
         >
         > I have recently begun working on a project that uses
        MPICH-3.2 and I am trying to resolve an issue where a server
        process busy waits at MPI_Finalize.
         >
         > We are trying to create a server process that accepts
        incoming connections from a known number of clients (say, N
        clients), forms a new communicator amongst everyone (server and
        clients) and then splits itself from the group and terminates,
        so that the clients now only work with each other.
         >
         > For very problem specific reasons, we cannot do
         > 'mpiexec -np N (other args)'
         >
         > So we have a server that publishes a service name to a
        nameserver and clients lookup the name to join the server. The
        server and client processes are started with separate calls to
        mpiexec, one to start the server and the rest N calls to start
        the clients.
         >
         > The server process busy-waits at the MPI_Finalize call, after
        it splits from the communicator and only finishes when all other
        clients reach their MPI_Finalize too.
         >
         > Consider a simplified case of only one server and one client.
        The simplified pseudocode is:
         >
         > Server process:
         > MPI_Init();
         > MPI_Open_port(...);
         > MPI_Publish_name(...); //publish service name to nameserver
         >
         > MPI_accept(...); // accept incoming connections and store
        into intercomm
         > MPI_Intercomm_merge(...);  // merge new client into intra-comm
         >
         > // now split the server from the client
         > MPI_Comm_rank(intra comm, rank); // rank=0
         > MPI_Comm_split(intra comm, (rank==0), rank, lone comm);
         >
         > MPI_Finalize(); // busy-waits here till client's sleep duration
         >
         > Client process: (simplified - assuming only one client is
        trying to connect)
         > MPI_Init();
         > MPI_Lookup_name(..);
         > MPI_Connect(...)
         >
         > // merge
         > MPI_Intercomm_merge(...); // merge with server
         >
         > // get rank and split
         > MPI_Comm_rank(intra comm, rank);  // rank=1
         > MPI_Comm_split(intra comm, rank==0, rank, lone comm);
         >
         > sleep(10); // sleep for 10 seconds - causes server to busy
        wait at MPI_Finalize for sleep duration
         >
         > MPI_Finalize(); // server and client finish here
         >
         > So my questions are:
         >
         > 1) Is busy-wait at MPI_Finalize the expected behaviour?
         >
         > 2) How to truly "disconnect" the server, so that it can end
        immediately at MPI_Finalize()? I had tried MPI_Comm_disconnect
        (also MPI_Comm_free) on both the server and client, but that
        didn't help.
         >
         > 3)  We don't want to see the server process consuming one
        core at 100% while it waits at MPI_Finalize. Are other
        alternatives apart from making the server process sleep, wakeup
        and keep polling a client, and then finally call MPI_Finalize?
         >
         > Thank you for any inputs that you can give here.
         >
         >
         > Regards,
         > K.N.Ramachandran
         > _______________________________________________
         > discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org> <mailto:discuss at mpich.org<mailto:discuss at mpich.org>>
         > To manage subscription options or unsubscribe:
         > https://lists.mpich.org/mailman/listinfo/discuss

        _______________________________________________
        discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org> <mailto:discuss at mpich.org<mailto:discuss at mpich.org>>
        To manage subscription options or unsubscribe:
        https://lists.mpich.org/mailman/listinfo/discuss




    Regards,
    K.N.Ramachandran




Regards,
K.N.Ramachandran


_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



Thanks,
K.N.Ramachandran



--
K.N.Ramachandran
Ph: 814-441-4279
_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160229/498c532d/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list