[mpich-discuss] Creating an MPI job using MPI_Comm_connect/accept
Seo, Sangmin
sseo at anl.gov
Mon Jan 4 20:09:19 CST 2016
Hi Mohammad,
It seems the same port name can be used only once, since the MPI 3.1 standard (p. 419, line 31) says "A port name may be reused after it is freed with MPI_CLOSE_PORT and released by the system.” Can you try close the port and open it again to establish a new connection? If it doesn’t work, could you send us your actual code (if possible, please send us a simplified version of your code)?
Regards,
Sangmin
On Dec 29, 2015, at 11:44 AM, Mohammad Javad Rashti <mjrashti at gmail.com<mailto:mjrashti at gmail.com>> wrote:
Hi,
Using mpich-3.1.2, we are trying to create a multi-process multi-node MPI job with the client-server model but we are having issues creating the global communicator we need.
We cannot use mpiexec to launch the MPI processes; they are launched by a different daemon and we want them to join a group and use MPI after they are launched.
We chose to use a server to publish a name/port and wait on a known number of clients to connect. The goal is to create an intracommunicator among all the clients and the server, and start normal MPI communication (not sure whether there is a better way to accomplish this goal?).
The problem is that the first client connects fine, but the subsequent clients block.
The simplified method that we are using is here:
------------------- Server -----------------
- Call MPI_Open_port(MPI_INFO_NULL, port_name)
- Call MPI_Publish_name(service_name, MPI_INFO_NULL, port_name)
- clients = 0
Loop until clients = MAX_CLIENTS:
if ( !clients )
- Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,MPI_COMM_FELF,&new_ircomm)
else
- Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)
- Call MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)
- previous_iacomm = new_iacomm
- clients ++
end Loop
---------------- Client ---------------
- Call MPI_Lookup_name(service_name, MPI_INFO_NULL, port_name)
- Call MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF, &new_ircomm)
- Call MPI_Intercomm_merge(new_ircomm, 1 , &new_iacomm)
- previous_iacomm = new_iacomm
Loop for all clients connecting after me:
- Call MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)
- Call MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)
- previous_iacomm = new_iacomm
end Loop
----------------------------------
Note that MPI report states that MPI_Comm_accept is collective over the calling communicator, that's why we are calling it by the server and all previously connected clients.
The problem we are having is that the first client connects fine, but the subsequent clients block on MPI_Comm_connect. Also the server and previously connected clients block on MPI_Comm_accept.
(the server does not block only if we use MPI_COMM_SELF for all accept calls, but that does not help us creating the global intracomm that we want.
I suspect that we are missing something in our usage of MPI_Comm_accept. Any insight is helpful and appreciated. I can send the actual C code if needed.
Thanks
Mohammad
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160105/be5bec0c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list