<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body dir="auto"><div>You will only receive the notification if you use a communicating call. That call doesn't do any communication so it won't return the error. <br><br><br></div><div><br>On Oct 9, 2014, at 11:37 AM, Roy, Hirak <<a href="mailto:Hirak_Roy@mentor.com">Hirak_Roy@mentor.com</a>> wrote:<br><br></div><blockquote type="cite"><div>
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">Hi Pavan and Wesley,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks Pavan for the information.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Wesley,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If I try MPI_Comm_remote_size periodically, then should I get Error-code, when the client on the other side of the intercom is killed ? ( I have already used MPI_ERRORS_RETURN).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Hirak<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="3" width="100%" noshade="" style="color:black" align="center">
</div>
<pre style="white-space:pre-wrap;orphans: auto;text-align:start;widows: auto;-webkit-text-stroke-width: 0px;word-spacing:0px"><span style="color:black">MPICH should detect and notify you about process failures without you having to install your own signal handler. You’ll be notified via the return code on your MPI call. You can also use an MPI Errhandler to catch this notification. So once you’ve set up your intercommunicator between your two processes, you should be able to change the default errhandler from MPI_ERRORS_ABORT to your own custom error handler (or just use MPI_ERRORS_RETURN and check the return codes).<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">If you’re asking about the implementation of the proposed fault tolerance features for MPI-4 in MPICH, that’s a work in progress. We hope to have something for the MPICH 3.2 release cycle, but it’s not guaranteed and it will still be a very experimental feature given that the MPI Forum has not yet actually adopted the fault tolerance proposal.<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">Thanks,<o:p></o:p></span></pre>
<pre><span style="color:black">Wesley<o:p></o:p></span></pre>
<pre><span style="color:black"><o:p> </o:p></span></pre>
<pre><span style="color:black">><i> On Oct 9, 2014, at 11:12 AM, Balaji, Pavan <<a href="https://lists.mpich.org/mailman/listinfo/discuss">balaji at anl.gov</a>> wrote:<o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> I’ll let Wesley answer the FT notification part.<o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> MPI-4 is a major standard release of MPI. That’ll take a few years. We are currently working on the MPI-3.1 release. (hopefully you know the difference between MPI and MPICH, otherwise it’ll take many emails to explain that part :-) ).<o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> — Pavan<o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> On Oct 9, 2014, at 11:09 AM, Roy, Hirak <<a href="https://lists.mpich.org/mailman/listinfo/discuss">Hirak_Roy at mentor.com</a>> wrote:<o:p></o:p></i></span></pre>
<pre><span style="color:black">><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Hi Pavan,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Just wondering with the current release whether we have any way to notify the server that client is terminated unexpectedly!<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Another point: When do we expect to have MPI-4 release out?<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Thanks,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Hirak<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> Please don’t rely on this feature. We are preparing for MPI-4 Fault Tolerance and are in the process of reworking a bunch of this stuff. This might or might not exist in the future if you are planning to use this for production code.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> — Pavan<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> On Oct 9, 2014, at 10:57 AM, Roy, Hirak <Hirak_Roy at <a href="http://mentor.com">mentor.com</a>> wrote:<o:p></o:p></i></span></pre>
<pre><span style="color:black">>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Hi Sangmin,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> The readme of mpich says the following :<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> ALMOST CERTAINLY CHANGE IN THE FUTURE!<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> In the current release, hydra notifies the MPICH library of failed<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> processes by sending a SIGUSR1 signal. The application can catch<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> this signal to be notified of failed processes. If the application<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> replaces the library's signal handler with its own, the application<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> must be sure to call the library's handler from it's own<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> handler. Note that you cannot call any MPI function from inside a<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> signal handler.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> If this is true, should not I expect SIGUSR1?<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Thanks,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Hirak<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> First of all, MPI functions are not signal safe. So, if you try to use signals within your MPI program, things might break.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> — Sangmin<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> On Oct 9, 2014, at 7:37 AM, Roy, Hirak <Hirak_Roy at <a href="http://mentor.com">mentor.com</a><mailto:Hirak_Roy at <a href="http://mentor.com">mentor.com</a>>> wrote:<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Hi ,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> I have two MPI processes (server and client) launched independently by two different mpiexec command. (mpich-3.0.4, sock-device)<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> 1> mpiexec –disable-auto-cleanup –n 1 ./server<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> 2> mpiexec –disable-auto-cleanup –n 1 ./client<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> The server opens a port and does MPI_Comm_accept.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> The client gets the port information and does MPI_Comm_connect and hence we get a new intercommunicator.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> I don’t do MPI_Comm_merge.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> I have installed my own signal handler for SIGUSR1 before even I call MPI_Init ( I guess, this will automatically chain the signal handler).<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>>>><i> signal (SIGUSR1, mysignalhandler);<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Now suppose, the ‘client’ process gets killed ( I forcefully kill the process by signal 9), I thought I would get SIGUSR1 in the process ‘server’.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> However, I don’t get any signal in ‘server’ process.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Am I doing something wrong?<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> I have noticed that if I start 4 client processes with single mpiexec command, and one client gets killed, rest of the 3 clients receive SIGUSR1.<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Does this mean, SIGUSR1 is not forwarded across processes connected using inter-communicator?<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> <o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Thanks,<o:p></o:p></i></span></pre>
<pre><span style="color:black">>>><i> Hirak</i><o:p></o:p></span></pre>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a></span><br><span>To manage subscription options or unsubscribe:</span><br><span><a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a></span></div></blockquote></body></html>