[mpich-discuss] MPI_Comm_split and end with Finalize

K. N. Ramachandran knram06 at gmail.com
Mon Feb 29 17:07:49 CST 2016


Does rank 0 stop here since ranks 0 and 1 are still connected through
Comm_World? If so, then the code attached before does not explicitly
demonstrate the problem, since I can't disconnect from Comm_World.

On Mon, Feb 29, 2016 at 6:04 PM, K. N. Ramachandran <knram06 at gmail.com>
wrote:

> Hi Halim,
>
> Please find attached a minimal example that seems to reproduce this. This
> is a more simplified version that does not do nameserver lookup.
>
> I compiled this as
> mpicxx -g test_two_procs.c -L /opt/mpich-3.2_install/lib/ -o two_procs
>
> Run this with
> mpiexec -np 2 ./two_procs
>
> We can see that rank 0 waits at Finalize, until rank 1 reaches it, even
> though both rank 0 and rank 1 call MPI_Comm_disconnect. It is my
> understanding that rank 0 should have finished without having to wait for
> rank 1.
>
> Also MPI_Comm_disconnect cannot be called on MPI_Comm_world.
>
> I have another minimal example, that involves nameserver lookup but I
> think this example should demonstrate it for now. Hope this helps.
>
>
> On Mon, Feb 29, 2016 at 5:39 PM, Halim Amer <aamer at anl.gov> wrote:
>
>> MPI_Finalize is collective over a set of connected processes. If the
>> server hangs in MPI_Finalize, it means that it is still connected to a
>> subset of the clients. It is difficult to know the reason without a
>> concrete piece of code. If you send us a minimal example that reproduces
>> the problem, we might be able to identify the issue.
>>
>> --Halim
>>
>> www.mcs.anl.gov/~aamer
>>
>> On 2/29/16 9:00 AM, K. N. Ramachandran wrote:
>>
>>> Hello all,
>>>
>>> I had tried just calling MPI_Comm_disconnect instead of MPI_Comm_split.
>>>
>>> I had tried this on just the server side, as well as both on the server
>>> and client, but I still see the issue of busy-wait at MPI_Finalize on
>>> the server side. Can anyone give any further inputs on this?
>>>
>>> It looks like the server process should be able to terminate early, but
>>> is held up by the client, even though they should be disconnected from
>>> each other.
>>>
>>>
>>>
>>> On Sat, Feb 27, 2016 at 7:00 PM, K. N. Ramachandran <knram06 at gmail.com
>>> <mailto:knram06 at gmail.com>> wrote:
>>>
>>>     Hi Pavan,
>>>
>>>     Thank you for the reply. I have presented only a very simplified
>>>     case of one server and one client and that is why the problem looks
>>>     strange.
>>>
>>>     The general case is one server acting as a meeting point and N
>>>     clients join the server and one intra comm is formed among them all.
>>>     Then the server splits off and terminates, leaving the intracomm and
>>>     then letting the clients work amongst themselves now.
>>>
>>>     I had also tried MPI_Comm_disconnect on the server, after calling
>>>     MPI_Comm_split, but even in that case, the server busy-waits for the
>>>     client at Finalize. The single server and single client was only to
>>>     demonstrate the problem I am facing.
>>>
>>>     Please let me know if you might need more information. Thanks.
>>>
>>>     On Sat, Feb 27, 2016 at 11:59 AM, Balaji, Pavan <balaji at anl.gov
>>>     <mailto:balaji at anl.gov>> wrote:
>>>
>>>
>>>         It's unclear what exactly you are trying to do here.  Why are
>>>         the clients connecting to the server and then immediately
>>>         "splitting off"?
>>>
>>>         Your "split-off" functionality needs to be implemented using
>>>         MPI_Comm_disconnect, not using MPI_Comm_split.  Comm_split
>>>         divides a communicator into smaller communicators, but all
>>>         processes are still very much connected.  So as long as the
>>>         server process is connected to the client processes, it might
>>>         still receive messages from the client process and thus cannot
>>>         simply exit.  Comm_disconnect, on the other hand, disconnects
>>>         the client processes from the server processes.
>>>
>>>         But then again, I have no idea why you are connecting to the
>>>         server and disconnecting immediately.
>>>
>>>            -- Pavan
>>>
>>>          > On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran
>>>         <knram06 at gmail.com <mailto:knram06 at gmail.com>> wrote:
>>>          >
>>>          > Hello all,
>>>          >
>>>          > I have recently begun working on a project that uses
>>>         MPICH-3.2 and I am trying to resolve an issue where a server
>>>         process busy waits at MPI_Finalize.
>>>          >
>>>          > We are trying to create a server process that accepts
>>>         incoming connections from a known number of clients (say, N
>>>         clients), forms a new communicator amongst everyone (server and
>>>         clients) and then splits itself from the group and terminates,
>>>         so that the clients now only work with each other.
>>>          >
>>>          > For very problem specific reasons, we cannot do
>>>          > 'mpiexec -np N (other args)'
>>>          >
>>>          > So we have a server that publishes a service name to a
>>>         nameserver and clients lookup the name to join the server. The
>>>         server and client processes are started with separate calls to
>>>         mpiexec, one to start the server and the rest N calls to start
>>>         the clients.
>>>          >
>>>          > The server process busy-waits at the MPI_Finalize call, after
>>>         it splits from the communicator and only finishes when all other
>>>         clients reach their MPI_Finalize too.
>>>          >
>>>          > Consider a simplified case of only one server and one client.
>>>         The simplified pseudocode is:
>>>          >
>>>          > Server process:
>>>          > MPI_Init();
>>>          > MPI_Open_port(...);
>>>          > MPI_Publish_name(...); //publish service name to nameserver
>>>          >
>>>          > MPI_accept(...); // accept incoming connections and store
>>>         into intercomm
>>>          > MPI_Intercomm_merge(...);  // merge new client into intra-comm
>>>          >
>>>          > // now split the server from the client
>>>          > MPI_Comm_rank(intra comm, rank); // rank=0
>>>          > MPI_Comm_split(intra comm, (rank==0), rank, lone comm);
>>>          >
>>>          > MPI_Finalize(); // busy-waits here till client's sleep
>>> duration
>>>          >
>>>          > Client process: (simplified - assuming only one client is
>>>         trying to connect)
>>>          > MPI_Init();
>>>          > MPI_Lookup_name(..);
>>>          > MPI_Connect(...)
>>>          >
>>>          > // merge
>>>          > MPI_Intercomm_merge(...); // merge with server
>>>          >
>>>          > // get rank and split
>>>          > MPI_Comm_rank(intra comm, rank);  // rank=1
>>>          > MPI_Comm_split(intra comm, rank==0, rank, lone comm);
>>>          >
>>>          > sleep(10); // sleep for 10 seconds - causes server to busy
>>>         wait at MPI_Finalize for sleep duration
>>>          >
>>>          > MPI_Finalize(); // server and client finish here
>>>          >
>>>          > So my questions are:
>>>          >
>>>          > 1) Is busy-wait at MPI_Finalize the expected behaviour?
>>>          >
>>>          > 2) How to truly "disconnect" the server, so that it can end
>>>         immediately at MPI_Finalize()? I had tried MPI_Comm_disconnect
>>>         (also MPI_Comm_free) on both the server and client, but that
>>>         didn't help.
>>>          >
>>>          > 3)  We don't want to see the server process consuming one
>>>         core at 100% while it waits at MPI_Finalize. Are other
>>>         alternatives apart from making the server process sleep, wakeup
>>>         and keep polling a client, and then finally call MPI_Finalize?
>>>          >
>>>          > Thank you for any inputs that you can give here.
>>>          >
>>>          >
>>>          > Regards,
>>>          > K.N.Ramachandran
>>>          > _______________________________________________
>>>          > discuss mailing list discuss at mpich.org <mailto:
>>> discuss at mpich.org>
>>>          > To manage subscription options or unsubscribe:
>>>          > https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>         _______________________________________________
>>>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org
>>> >
>>>         To manage subscription options or unsubscribe:
>>>         https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>>
>>>     Regards,
>>>     K.N.Ramachandran
>>>
>>>
>>>
>>>
>>> Regards,
>>> K.N.Ramachandran
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> Thanks,
> K.N.Ramachandran
>



-- 
K.N.Ramachandran
Ph: 814-441-4279
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160229/6f93fe62/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list