[mpich-discuss] Creating an MPI job using MPI_Comm_connect/accept

Mohammad Javad Rashti mjrashti at gmail.com
Tue Jan 5 09:56:59 CST 2016


Thanks Sangmin, sure, I will check it out.

On Tue, Jan 5, 2016 at 10:47 AM, Seo, Sangmin <sseo at anl.gov> wrote:

> Hi Mohammad,
>
> I was wrong in my answer. The same issue was discussed last year, and the
> problem was fixed after mpich-3.1.3. Please refer
> http://lists.mpich.org/pipermail/discuss/2015-January/003660.html
>
> Can you try your code with a recent version of mpich?
>
> Regards,
> Sangmin
>
>
> On Jan 4, 2016, at 8:09 PM, Seo, Sangmin <sseo at anl.gov> wrote:
>
> Hi Mohammad,
>
> It seems the same port name can be used only once, since the MPI 3.1
> standard (p. 419, line 31) says "A port name may be reused after it is
> freed with MPI_CLOSE_PORT and released by the system.” Can you try close
> the port and open it again to establish a new connection? If it doesn’t
> work, could you send us your actual code (if possible, please send us a
> simplified version of your code)?
>
> Regards,
> Sangmin
>
>
> On Dec 29, 2015, at 11:44 AM, Mohammad Javad Rashti <mjrashti at gmail.com>
> wrote:
>
> Hi,
> Using mpich-3.1.2, we are trying to create a multi-process multi-node MPI
> job with the client-server model but we are having issues creating the
> global communicator we need.
>
> We cannot use mpiexec to launch the MPI processes; they are launched by a
> different daemon and we want them to join a group and use MPI after they
> are launched.
> We chose to use a server to publish a name/port and wait on a known number
> of clients to connect. The goal is to create an intracommunicator among all
> the clients and the server, and start normal MPI communication (not sure
> whether there is a better way to accomplish this goal?).
>
> *The problem *is that the first client connects fine, but the subsequent
> clients block.
>
> The *simplified method *that we are using is here:
>
> * ------------------- Server -----------------*
>
> - Call *MPI_Open_port(MPI_INFO_NULL, port_name)*
>
> - Call *MPI_Publish_name(service_name, MPI_INFO_NULL, port_name)*
>
> - clients = 0
>
> Loop until clients = MAX_CLIENTS:
>
>    if ( !clients )
>        - Call
> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,MPI_COMM_FELF,&new_ircomm)*
>
>    else
>        - Call
> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*
>
>    - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*
>
>    - previous_iacomm = new_iacomm
>
>    - clients ++
>
> end Loop
>
> *---------------- Client ---------------*
>
> - Call *MPI_Lookup_name(service_name, MPI_INFO_NULL, port_name)*
>
> - Call *MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &new_ircomm)*
>
> - Call *MPI_Intercomm_merge(new_ircomm, 1 , &new_iacomm)*
>
> - previous_iacomm *= *new_iacomm
>
> Loop for all clients connecting after me:
>
>     - Call
> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*
>
>     - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*
>
>     - previous_iacomm = new_iacomm
>
> end Loop
>
> ----------------------------------
>
> *Note *that MPI report states that MPI_Comm_accept is collective over the
> calling communicator, that's why we are calling it by the server and all
> previously connected clients.
>
> *The problem we are having *is that the first client connects fine, but
> the subsequent clients block on MPI_Comm_connect. Also the server and
> previously connected clients block on MPI_Comm_accept.
>
> (the server does not block only if we use MPI_COMM_SELF for all accept
> calls, but that does not help us creating the global intracomm that we want.
>
> I suspect that we are missing something in our usage of MPI_Comm_accept.
> Any insight is helpful and appreciated. I can send the actual C code if
> needed.
>
> Thanks
> Mohammad
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160105/4f63bfa5/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list