[mpich-discuss] Creating an MPI job using MPI_Comm_connect/accept

Mohammad Javad Rashti mjrashti at gmail.com
Thu Jan 14 14:44:17 CST 2016


Sangmin,
MPICH 3.2 worked well and does not seem to have this issue.

Thank you

On Tue, Jan 5, 2016 at 10:56 AM, Mohammad Javad Rashti <mjrashti at gmail.com>
wrote:

> Thanks Sangmin, sure, I will check it out.
>
> On Tue, Jan 5, 2016 at 10:47 AM, Seo, Sangmin <sseo at anl.gov> wrote:
>
>> Hi Mohammad,
>>
>> I was wrong in my answer. The same issue was discussed last year, and the
>> problem was fixed after mpich-3.1.3. Please refer
>> http://lists.mpich.org/pipermail/discuss/2015-January/003660.html
>>
>> Can you try your code with a recent version of mpich?
>>
>> Regards,
>> Sangmin
>>
>>
>> On Jan 4, 2016, at 8:09 PM, Seo, Sangmin <sseo at anl.gov> wrote:
>>
>> Hi Mohammad,
>>
>> It seems the same port name can be used only once, since the MPI 3.1
>> standard (p. 419, line 31) says "A port name may be reused after it is
>> freed with MPI_CLOSE_PORT and released by the system.” Can you try close
>> the port and open it again to establish a new connection? If it doesn’t
>> work, could you send us your actual code (if possible, please send us a
>> simplified version of your code)?
>>
>> Regards,
>> Sangmin
>>
>>
>> On Dec 29, 2015, at 11:44 AM, Mohammad Javad Rashti <mjrashti at gmail.com>
>> wrote:
>>
>> Hi,
>> Using mpich-3.1.2, we are trying to create a multi-process multi-node MPI
>> job with the client-server model but we are having issues creating the
>> global communicator we need.
>>
>> We cannot use mpiexec to launch the MPI processes; they are launched by a
>> different daemon and we want them to join a group and use MPI after they
>> are launched.
>> We chose to use a server to publish a name/port and wait on a known
>> number of clients to connect. The goal is to create an intracommunicator
>> among all the clients and the server, and start normal MPI communication
>> (not sure whether there is a better way to accomplish this goal?).
>>
>> *The problem *is that the first client connects fine, but the subsequent
>> clients block.
>>
>> The *simplified method *that we are using is here:
>>
>> * ------------------- Server -----------------*
>>
>> - Call *MPI_Open_port(MPI_INFO_NULL, port_name)*
>>
>> - Call *MPI_Publish_name(service_name, MPI_INFO_NULL, port_name)*
>>
>> - clients = 0
>>
>> Loop until clients = MAX_CLIENTS:
>>
>>    if ( !clients )
>>        - Call
>> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,MPI_COMM_FELF,&new_ircomm)*
>>
>>    else
>>        - Call
>> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*
>>
>>    - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*
>>
>>    - previous_iacomm = new_iacomm
>>
>>    - clients ++
>>
>> end Loop
>>
>> *---------------- Client ---------------*
>>
>> - Call *MPI_Lookup_name(service_name, MPI_INFO_NULL, port_name)*
>>
>> - Call *MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_SELF,
>> &new_ircomm)*
>>
>> - Call *MPI_Intercomm_merge(new_ircomm, 1 , &new_iacomm)*
>>
>> - previous_iacomm *= *new_iacomm
>>
>> Loop for all clients connecting after me:
>>
>>     - Call
>> *MPI_Comm_accept(port_name,MPI_INFO_NULL,0,previous_iacomm,&new_ircomm)*
>>
>>     - Call *MPI_Intercomm_merge(new_ircomm, 0, &new_iacomm)*
>>
>>     - previous_iacomm = new_iacomm
>>
>> end Loop
>>
>> ----------------------------------
>>
>> *Note *that MPI report states that MPI_Comm_accept is collective over
>> the calling communicator, that's why we are calling it by the server and
>> all previously connected clients.
>>
>> *The problem we are having *is that the first client connects fine, but
>> the subsequent clients block on MPI_Comm_connect. Also the server and
>> previously connected clients block on MPI_Comm_accept.
>>
>> (the server does not block only if we use MPI_COMM_SELF for all accept
>> calls, but that does not help us creating the global intracomm that we want.
>>
>> I suspect that we are missing something in our usage of MPI_Comm_accept.
>> Any insight is helpful and appreciated. I can send the actual C code if
>> needed.
>>
>> Thanks
>> Mohammad
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160114/cf0cf883/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list