[mpich-discuss] MPI_Comm_split and end with Finalize

Halim Amer aamer at anl.gov
Mon Feb 29 17:23:55 CST 2016


Yes, both processes are still connected in MPI_COMM_WORLD.

--Halim

On 2/29/16 5:07 PM, K. N. Ramachandran wrote:
> Does rank 0 stop here since ranks 0 and 1 are still connected through
> Comm_World? If so, then the code attached before does not explicitly
> demonstrate the problem, since I can't disconnect from Comm_World.
>
> On Mon, Feb 29, 2016 at 6:04 PM, K. N. Ramachandran <knram06 at gmail.com
> <mailto:knram06 at gmail.com>> wrote:
>
>     Hi Halim,
>
>     Please find attached a minimal example that seems to reproduce this.
>     This is a more simplified version that does not do nameserver lookup.
>
>     I compiled this as
>     mpicxx -g test_two_procs.c -L /opt/mpich-3.2_install/lib/ -o two_procs
>
>     Run this with
>     mpiexec -np 2 ./two_procs
>
>     We can see that rank 0 waits at Finalize, until rank 1 reaches it,
>     even though both rank 0 and rank 1 call MPI_Comm_disconnect. It is
>     my understanding that rank 0 should have finished without having to
>     wait for rank 1.
>
>     Also MPI_Comm_disconnect cannot be called on MPI_Comm_world.
>
>     I have another minimal example, that involves nameserver lookup but
>     I think this example should demonstrate it for now. Hope this helps.
>
>
>     On Mon, Feb 29, 2016 at 5:39 PM, Halim Amer <aamer at anl.gov
>     <mailto:aamer at anl.gov>> wrote:
>
>         MPI_Finalize is collective over a set of connected processes. If
>         the server hangs in MPI_Finalize, it means that it is still
>         connected to a subset of the clients. It is difficult to know
>         the reason without a concrete piece of code. If you send us a
>         minimal example that reproduces the problem, we might be able to
>         identify the issue.
>
>         --Halim
>
>         www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
>
>         On 2/29/16 9:00 AM, K. N. Ramachandran wrote:
>
>             Hello all,
>
>             I had tried just calling MPI_Comm_disconnect instead of
>             MPI_Comm_split.
>
>             I had tried this on just the server side, as well as both on
>             the server
>             and client, but I still see the issue of busy-wait at
>             MPI_Finalize on
>             the server side. Can anyone give any further inputs on this?
>
>             It looks like the server process should be able to terminate
>             early, but
>             is held up by the client, even though they should be
>             disconnected from
>             each other.
>
>
>
>             On Sat, Feb 27, 2016 at 7:00 PM, K. N. Ramachandran
>             <knram06 at gmail.com <mailto:knram06 at gmail.com>
>             <mailto:knram06 at gmail.com <mailto:knram06 at gmail.com>>> wrote:
>
>                  Hi Pavan,
>
>                  Thank you for the reply. I have presented only a very
>             simplified
>                  case of one server and one client and that is why the
>             problem looks
>                  strange.
>
>                  The general case is one server acting as a meeting
>             point and N
>                  clients join the server and one intra comm is formed
>             among them all.
>                  Then the server splits off and terminates, leaving the
>             intracomm and
>                  then letting the clients work amongst themselves now.
>
>                  I had also tried MPI_Comm_disconnect on the server,
>             after calling
>                  MPI_Comm_split, but even in that case, the server
>             busy-waits for the
>                  client at Finalize. The single server and single client
>             was only to
>                  demonstrate the problem I am facing.
>
>                  Please let me know if you might need more information.
>             Thanks.
>
>                  On Sat, Feb 27, 2016 at 11:59 AM, Balaji, Pavan
>             <balaji at anl.gov <mailto:balaji at anl.gov>
>                  <mailto:balaji at anl.gov <mailto:balaji at anl.gov>>> wrote:
>
>
>                      It's unclear what exactly you are trying to do
>             here.  Why are
>                      the clients connecting to the server and then
>             immediately
>                      "splitting off"?
>
>                      Your "split-off" functionality needs to be
>             implemented using
>                      MPI_Comm_disconnect, not using MPI_Comm_split.
>             Comm_split
>                      divides a communicator into smaller communicators,
>             but all
>                      processes are still very much connected.  So as
>             long as the
>                      server process is connected to the client
>             processes, it might
>                      still receive messages from the client process and
>             thus cannot
>                      simply exit.  Comm_disconnect, on the other hand,
>             disconnects
>                      the client processes from the server processes.
>
>                      But then again, I have no idea why you are
>             connecting to the
>                      server and disconnecting immediately.
>
>                         -- Pavan
>
>                       > On Feb 26, 2016, at 5:31 PM, K. N. Ramachandran
>                      <knram06 at gmail.com <mailto:knram06 at gmail.com>
>             <mailto:knram06 at gmail.com <mailto:knram06 at gmail.com>>> wrote:
>                       >
>                       > Hello all,
>                       >
>                       > I have recently begun working on a project that uses
>                      MPICH-3.2 and I am trying to resolve an issue where
>             a server
>                      process busy waits at MPI_Finalize.
>                       >
>                       > We are trying to create a server process that
>             accepts
>                      incoming connections from a known number of clients
>             (say, N
>                      clients), forms a new communicator amongst everyone
>             (server and
>                      clients) and then splits itself from the group and
>             terminates,
>                      so that the clients now only work with each other.
>                       >
>                       > For very problem specific reasons, we cannot do
>                       > 'mpiexec -np N (other args)'
>                       >
>                       > So we have a server that publishes a service
>             name to a
>                      nameserver and clients lookup the name to join the
>             server. The
>                      server and client processes are started with
>             separate calls to
>                      mpiexec, one to start the server and the rest N
>             calls to start
>                      the clients.
>                       >
>                       > The server process busy-waits at the
>             MPI_Finalize call, after
>                      it splits from the communicator and only finishes
>             when all other
>                      clients reach their MPI_Finalize too.
>                       >
>                       > Consider a simplified case of only one server
>             and one client.
>                      The simplified pseudocode is:
>                       >
>                       > Server process:
>                       > MPI_Init();
>                       > MPI_Open_port(...);
>                       > MPI_Publish_name(...); //publish service name to
>             nameserver
>                       >
>                       > MPI_accept(...); // accept incoming connections
>             and store
>                      into intercomm
>                       > MPI_Intercomm_merge(...);  // merge new client
>             into intra-comm
>                       >
>                       > // now split the server from the client
>                       > MPI_Comm_rank(intra comm, rank); // rank=0
>                       > MPI_Comm_split(intra comm, (rank==0), rank, lone
>             comm);
>                       >
>                       > MPI_Finalize(); // busy-waits here till client's
>             sleep duration
>                       >
>                       > Client process: (simplified - assuming only one
>             client is
>                      trying to connect)
>                       > MPI_Init();
>                       > MPI_Lookup_name(..);
>                       > MPI_Connect(...)
>                       >
>                       > // merge
>                       > MPI_Intercomm_merge(...); // merge with server
>                       >
>                       > // get rank and split
>                       > MPI_Comm_rank(intra comm, rank);  // rank=1
>                       > MPI_Comm_split(intra comm, rank==0, rank, lone
>             comm);
>                       >
>                       > sleep(10); // sleep for 10 seconds - causes
>             server to busy
>                      wait at MPI_Finalize for sleep duration
>                       >
>                       > MPI_Finalize(); // server and client finish here
>                       >
>                       > So my questions are:
>                       >
>                       > 1) Is busy-wait at MPI_Finalize the expected
>             behaviour?
>                       >
>                       > 2) How to truly "disconnect" the server, so that
>             it can end
>                      immediately at MPI_Finalize()? I had tried
>             MPI_Comm_disconnect
>                      (also MPI_Comm_free) on both the server and client,
>             but that
>                      didn't help.
>                       >
>                       > 3)  We don't want to see the server process
>             consuming one
>                      core at 100% while it waits at MPI_Finalize. Are other
>                      alternatives apart from making the server process
>             sleep, wakeup
>                      and keep polling a client, and then finally call
>             MPI_Finalize?
>                       >
>                       > Thank you for any inputs that you can give here.
>                       >
>                       >
>                       > Regards,
>                       > K.N.Ramachandran
>                       > _______________________________________________
>                       > discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>             <mailto:discuss at mpich.org>>
>                       > To manage subscription options or unsubscribe:
>                       > https://lists.mpich.org/mailman/listinfo/discuss
>
>                      _______________________________________________
>                      discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>             <mailto:discuss at mpich.org>>
>                      To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>                  Regards,
>                  K.N.Ramachandran
>
>
>
>
>             Regards,
>             K.N.Ramachandran
>
>
>             _______________________________________________
>             discuss mailing list discuss at mpich.org
>             <mailto:discuss at mpich.org>
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>     Thanks,
>     K.N.Ramachandran
>
>
>
>
> --
> K.N.Ramachandran
> Ph: 814-441-4279
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list