[mpich-discuss] Creating an MPI job using MPI_Comm_connect/accept

Seo, Sangmin sseo at anl.gov
Tue Jan 5 09:47:59 CST 2016


Hi Mohammad,

I was wrong in my answer. The same issue was discussed last year, and the problem was fixed after mpich-3.1.3. Please refer http://lists.mpich.org/pipermail/discuss/2015-January/003660.html

Can you try your code with a recent version of mpich?

Regards,
Sangmin


On Jan 4, 2016, at 8:09 PM, Seo, Sangmin <sseo at anl.gov<mailto:sseo at anl.gov>> wrote:

Hi Mohammad,

It seems the same port name can be used only once, since the MPI 3.1 standard (p. 419, line 31) says "A port name may be reused after it is freed with MPI_CLOSE_PORT and released by the system.” Can you try close the port and open it again to establish a new connection? If it doesn’t work, could you send us your actual code (if possible, please send us a simplified version of your code)?

Regards,
Sangmin


On Dec 29, 2015, at 11:44 AM, Mohammad Javad Rashti <mjrashti at gmail.com<mailto:mjrashti at gmail.com>> wrote:

Hi,
Using mpich-3.1.2, we are trying to create a multi-process multi-node MPI job with the client-server model but we are having issues creating the global communicator we need.

We cannot use mpiexec to launch the MPI processes; they are launched by a different daemon and we want them to join a group and use MPI after they are launched.
We chose to use a server to publish a name/port and wait on a known number of clients to connect. The goal is to create an intracommunicator among all the clients and the server, and start normal MPI communication (not sure whether there is a better way to accomplish this goal?).

The problem is that the first client connects fine, but the subsequent clients block.

The simplified method that we are using is here:

 ------------------- Server -----------------

- Call MPI_Open_port(MPI_INFO_NULL, port_name)

- Call MPI_Publish_name(service_name, MPI_INFO_NULL, port_name)

- clients = 0

Loop until clients = MAX_CLIENTS:

   if ( !clients )
       - Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,MPI_COMM_FELF,&new_ircomm)

   else
       - Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)

   - Call MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)

   - previous_iacomm = new_iacomm

   - clients ++

end Loop

---------------- Client ---------------

- Call MPI_Lookup_name(service_name, MPI_INFO_NULL, port_name)

- Call MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &new_ircomm)

- Call MPI_Intercomm_merge(new_ircomm, 1 , &new_iacomm)

- previous_iacomm = new_iacomm

Loop for all clients connecting after me:

    - Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)

    - Call MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)

    - previous_iacomm = new_iacomm

end Loop

----------------------------------

Note that MPI report states that MPI_Comm_accept is collective over the calling communicator, that's why we are calling it by the server and all previously connected clients.

The problem we are having is that the first client connects fine, but the subsequent clients block on MPI_Comm_connect. Also the server and previously connected clients block on MPI_Comm_accept.

(the server does not block only if we use MPI_COMM_SELF for all accept calls, but that does not help us creating the global intracomm that we want.

I suspect that we are missing something in our usage of MPI_Comm_accept. Any insight is helpful and appreciated. I can send the actual C code if needed.

Thanks
Mohammad
_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160105/488555ff/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list