[mpich-discuss] Creating an MPI job using MPI_Comm_connect/accept

Mohammad Javad Rashti mjrashti at gmail.com
Tue Dec 29 11:44:46 CST 2015


Hi,
Using mpich-3.1.2, we are trying to create a multi-process multi-node MPI
job with the client-server model but we are having issues creating the
global communicator we need.

We cannot use mpiexec to launch the MPI processes; they are launched by a
different daemon and we want them to join a group and use MPI after they
are launched.
We chose to use a server to publish a name/port and wait on a known number
of clients to connect. The goal is to create an intracommunicator among all
the clients and the server, and start normal MPI communication (not sure
whether there is a better way to accomplish this goal?).

*The problem *is that the first client connects fine, but the subsequent
clients block.

The *simplified method *that we are using is here:

* ------------------- Server -----------------*

- Call *MPI_Open_port(MPI_INFO_NULL, port_name)*

- Call *MPI_Publish_name(service_name, MPI_INFO_NULL, port_name)*

- clients = 0

Loop until clients = MAX_CLIENTS:

   if ( !clients )
       - Call
*MPI_Comm_accept(port_name,MPI_INFO_NULL,0,MPI_COMM_FELF,&new_ircomm)*

   else
       - Call
*MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*

   - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*

   - previous_iacomm = new_iacomm

   - clients ++

end Loop

*---------------- Client ---------------*

- Call *MPI_Lookup_name(service_name, MPI_INFO_NULL, port_name)*

- Call *MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF,
&new_ircomm)*

- Call *MPI_Intercomm_merge(new_ircomm, 1 , &new_iacomm)*

- previous_iacomm *= *new_iacomm

Loop for all clients connecting after me:

    - Call
*MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*

    - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*

    - previous_iacomm = new_iacomm

end Loop

----------------------------------

*Note *that MPI report states that MPI_Comm_accept is collective over the
calling communicator, that's why we are calling it by the server and all
previously connected clients.

*The problem we are having *is that the first client connects fine, but the
subsequent clients block on MPI_Comm_connect. Also the server and
previously connected clients block on MPI_Comm_accept.

(the server does not block only if we use MPI_COMM_SELF for all accept
calls, but that does not help us creating the global intracomm that we want.

I suspect that we are missing something in our usage of MPI_Comm_accept.
Any insight is helpful and appreciated. I can send the actual C code if
needed.

Thanks
Mohammad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151229/89df0026/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list