[mpich-discuss] MPI_Comm_split and end with Finalize
Halim Amer
aamer at anl.gov
Mon Feb 29 17:23:55 CST 2016
Yes, both processes are still connected in MPI_COMM_WORLD.
--Halim
On 2/29/16 5:07 PM, K. N. Ramachandran wrote:
> Does rank 0 stop here since ranks 0 and 1 are still connected through
> Comm_World? If so, then the code attached before does not explicitly
> demonstrate the problem, since I can't disconnect from Comm_World.
>
> On Mon, Feb 29, 2016 at 6:04 PM, K. N. Ramachandran <knram06 at gmail.com
> <mailto:knram06 at gmail.com>> wrote:
>
> Hi Halim,
>
> Please find attached a minimal example that seems to reproduce this.
> This is a more simplified version that does not do nameserver lookup.
>
> I compiled this as
> mpicxx -g test_two_procs.c -L /opt/mpich-3.2_install/lib/ -o two_procs
>
> Run this with
> mpiexec -np 2 ./two_procs
>
> We can see that rank 0 waits at Finalize, until rank 1 reaches it,
> even though both rank 0 and rank 1 call MPI_Comm_disconnect. It is
> my understanding that rank 0 should have finished without having to
> wait for rank 1.
>
> Also MPI_Comm_disconnect cannot be called on MPI_Comm_world.
>
> I have another minimal example, that involves nameserver lookup but
> I think this example should demonstrate it for now. Hope this helps.
>
>
> On Mon, Feb 29, 2016 at 5:39 PM, Halim Amer <aamer at anl.gov
> <mailto:aamer at anl.gov>> wrote:
>
> MPI_Finalize is collective over a set of connected processes. If
> the server hangs in MPI_Finalize, it means that it is still
> connected to a subset of the clients. It is difficult to know
> the reason without a concrete piece of code. If you send us a
> minimal example that reproduces the problem, we might be able to
> identify the issue.
>
> --Halim
>
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
>
> On 2/29/16 9:00 AM, K. N. Ramachandran wrote:
>
> Hello all,
>
> I had tried just calling MPI_Comm_disconnect instead of
> MPI_Comm_split.
>
> I had tried this on just the server side, as well as both on
> the server
> and client, but I still see the issue of busy-wait at
> MPI_Finalize on
> the server side. Can anyone give any further inputs on this?
>
> It looks like the server process should be able to terminate
> early, but
> is held up by the client, even though they should be
> disconnected from
> each other.
>
>
>
> On Sat, Feb 27, 2016 at 7:00 PM, K. N. Ramachandran
> <knram06 at gmail.com <mailto:knram06 at gmail.com>
> <mailto:knram06 at gmail.com <mailto:knram06 at gmail.com>>> wrote:
>
> Hi Pavan,
>
> Thank you for the reply. I have presented only a very
> simplified
> case of one server and one client and that is why the
> problem looks
> strange.
>
> The general case is one server acting as a meeting
> point and N
> clients join the server and one intra comm is formed
> among them all.
> Then the server splits off and terminates, leaving the
> intracomm and
> then letting the clients work amongst themselves now.
>
> I had also tried MPI_Comm_disconnect on the server,
> after calling
> MPI_Comm_split, but even in that case, the server
> busy-waits for the
> client at Finalize. The single server and single client
> was only to
> demonstrate the problem I am facing.
>
> Please let me know if you might need more information.
> Thanks.
>
> On Sat, Feb 27, 2016 at 11:59 AM, Balaji, Pavan
> <balaji at anl.gov <mailto:balaji at anl.gov>
> <mailto:balaji at anl.gov <mailto:balaji at anl.gov>>> wrote:
>
>
> It's unclear what exactly you are trying to do
> here. Why are
> the clients connecting to the server and then
> immediately
> "splitting off"?
>
> Your "split-off" functionality needs to be
> implemented using
> MPI_Comm_disconnect, not using MPI_Comm_split.
> Comm_split
> divides a communicator into smaller communicators,
> but all
> processes are still very much connected. So as
> long as the
> server process is connected to the client
> processes, it might
> still receive messages from the client process and
> thus cannot
> simply exit. Comm_disconnect, on the other hand,
> disconnects
> the client processes from the server processes.
>
> But then again, I have no idea why you are
> connecting to the
> server and disconnecting immediately.
>
> -- Pavan
>
> > On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran
> <knram06 at gmail.com <mailto:knram06 at gmail.com>
> <mailto:knram06 at gmail.com <mailto:knram06 at gmail.com>>> wrote:
> >
> > Hello all,
> >
> > I have recently begun working on a project that uses
> MPICH-3.2 and I am trying to resolve an issue where
> a server
> process busy waits at MPI_Finalize.
> >
> > We are trying to create a server process that
> accepts
> incoming connections from a known number of clients
> (say, N
> clients), forms a new communicator amongst everyone
> (server and
> clients) and then splits itself from the group and
> terminates,
> so that the clients now only work with each other.
> >
> > For very problem specific reasons, we cannot do
> > 'mpiexec -np N (other args)'
> >
> > So we have a server that publishes a service
> name to a
> nameserver and clients lookup the name to join the
> server. The
> server and client processes are started with
> separate calls to
> mpiexec, one to start the server and the rest N
> calls to start
> the clients.
> >
> > The server process busy-waits at the
> MPI_Finalize call, after
> it splits from the communicator and only finishes
> when all other
> clients reach their MPI_Finalize too.
> >
> > Consider a simplified case of only one server
> and one client.
> The simplified pseudocode is:
> >
> > Server process:
> > MPI_Init();
> > MPI_Open_port(...);
> > MPI_Publish_name(...); //publish service name to
> nameserver
> >
> > MPI_accept(...); // accept incoming connections
> and store
> into intercomm
> > MPI_Intercomm_merge(...); // merge new client
> into intra-comm
> >
> > // now split the server from the client
> > MPI_Comm_rank(intra comm, rank); // rank=0
> > MPI_Comm_split(intra comm, (rank==0), rank, lone
> comm);
> >
> > MPI_Finalize(); // busy-waits here till client's
> sleep duration
> >
> > Client process: (simplified - assuming only one
> client is
> trying to connect)
> > MPI_Init();
> > MPI_Lookup_name(..);
> > MPI_Connect(...)
> >
> > // merge
> > MPI_Intercomm_merge(...); // merge with server
> >
> > // get rank and split
> > MPI_Comm_rank(intra comm, rank); // rank=1
> > MPI_Comm_split(intra comm, rank==0, rank, lone
> comm);
> >
> > sleep(10); // sleep for 10 seconds - causes
> server to busy
> wait at MPI_Finalize for sleep duration
> >
> > MPI_Finalize(); // server and client finish here
> >
> > So my questions are:
> >
> > 1) Is busy-wait at MPI_Finalize the expected
> behaviour?
> >
> > 2) How to truly "disconnect" the server, so that
> it can end
> immediately at MPI_Finalize()? I had tried
> MPI_Comm_disconnect
> (also MPI_Comm_free) on both the server and client,
> but that
> didn't help.
> >
> > 3) We don't want to see the server process
> consuming one
> core at 100% while it waits at MPI_Finalize. Are other
> alternatives apart from making the server process
> sleep, wakeup
> and keep polling a client, and then finally call
> MPI_Finalize?
> >
> > Thank you for any inputs that you can give here.
> >
> >
> > Regards,
> > K.N.Ramachandran
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
> <mailto:discuss at mpich.org>>
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
> <mailto:discuss at mpich.org>>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> Regards,
> K.N.Ramachandran
>
>
>
>
> Regards,
> K.N.Ramachandran
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> Thanks,
> K.N.Ramachandran
>
>
>
>
> --
> K.N.Ramachandran
> Ph: 814-441-4279
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list