[mpich-discuss] MPI_Comm_connect issue

Min Si msi at il.is.s.u-tokyo.ac.jp
Thu May 12 11:05:03 CDT 2016


Hi Hirak,

In your scenario, the hanging on the client side is not a bug (but yes, 
it is not user-friendly).

Here is the description about MPI_Comm_connect in the MPI 3.0 standard.
>
> If the named port does not exist (or has been closed), 
> MPI_COMM_CONNECT raises an error of class MPI_ERR_PORT.
>
 > run_good in your example.
>
> If the port exists, but does not have a pending MPI_COMM_ACCEPT, the 
> connection attempt will eventually time out after an 
> implementation-defined time, or succeed when the server calls 
> MPI_COMM_ACCEPT. In the case of a time out, MPI_COMM_CONNECT raises an 
> error of class MPI_ERR_PORT.
>
MPI: A Message-Passing Interface Stand > This is the run_bad case, the 
"eventually time out" is implementation dependent : A Message-Passing 
Interface Standard . However, I just glanced over the code, it seems 
MPICH currently does not handle user-specified timeout through the info 
argument. I will look into code and keep you updated.

Min

On 5/11/16 11:39 PM, Roy, Hirak wrote:
>
> Hi mpich team,
>
> I am using MPICH 3.0.4 sock with dynamic connection (accept/connect).
>
> I am facing issues with the following scenario, when there is a race 
> condition in MPI_Comm_connect.
>
> Here is the case
>
> 1>Server opens a port, writes the port info in a file, calls 
> MPI_Comm_accept
>
> 2>Client1 & Client2 reads the port information from file and each 
> client calls MPI_Comm_connect
>
> 3>Server accepts one client, disconnects, closes the port, exit
>
> 4>One of the client successfully connects, disconnects, exit
>
> 5>The other client hangs in MPI_Comm_connect
>
> If you do the following steps, you can reproduce the issue
>
> 1>Please set compiler and installation path of MPICH in makefile
>
> 2>make : compiles
>
> 3>make run_bad
>
> I have noticed that in case client calls MPI_Comm_connect after the 
> port is closed, the connect calls successfully terminates with error 
> (make run_good).
>
> Please let me know whether it is a bug or not.
>
> Please also let me know if the connect call can be configured with a 
> timeout or not.
>
> Thanks,
>
> Hirak
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160512/14725d34/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list