[mpich-discuss] MPI process killed and SIGUSR1

Roy, Hirak Hirak_Roy at mentor.com
Thu Oct 9 11:09:37 CDT 2014


Hi Pavan,



Just wondering with the current release whether we have any way to notify the server that client is terminated unexpectedly!

Another point: When do we expect to have MPI-4 release out?



Thanks,

Hirak





Please don't rely on this feature.  We are preparing for MPI-4 Fault Tolerance and are in the process of reworking a bunch of this stuff.  This might or might not exist in the future if you are planning to use this for production code.



  - Pavan



On Oct 9, 2014, at 10:57 AM, Roy, Hirak <Hirak_Roy at mentor.com<https://lists.mpich.org/mailman/listinfo/discuss>> wrote:



>

> Hi Sangmin,

>

> The readme of mpich says the following :

>

> FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL

> ALMOST CERTAINLY CHANGE IN THE FUTURE!

>

>    In the current release, hydra notifies the MPICH library of failed

>    processes by sending a SIGUSR1 signal.  The application can catch

>    this signal to be notified of failed processes.  If the application

>    replaces the library's signal handler with its own, the application

>    must be sure to call the library's handler from it's own

>    handler.  Note that you cannot call any MPI function from inside a

>    signal handler.

>

> If this is true, should not I expect SIGUSR1?

>

>

> Thanks,

> Hirak

> First of all, MPI functions are not signal safe. So, if you try to use signals within your MPI program, things might break.

>

> - Sangmin

>

>

> On Oct 9, 2014, at 7:37 AM, Roy, Hirak <Hirak_Roy at mentor.com<mailto:Hirak_Roy at mentor.com>> wrote:

>

> Hi ,

>

> I have two MPI processes (server and client)  launched independently by two different mpiexec command. (mpich-3.0.4, sock-device)

> 1>    mpiexec -disable-auto-cleanup -n 1 ./server

> 2>    mpiexec -disable-auto-cleanup -n 1 ./client

>

> The server opens a port and does MPI_Comm_accept.

> The client gets the port information and does MPI_Comm_connect and hence we get a new intercommunicator.

> I don't do MPI_Comm_merge.

>

> I have installed my own signal handler for SIGUSR1 before even I call MPI_Init ( I guess, this will automatically chain the signal handler).

>

> >> signal (SIGUSR1, mysignalhandler);

>

> Now suppose, the 'client' process gets killed ( I forcefully kill the process by signal 9), I thought I would get SIGUSR1 in the process 'server'.

> However, I don't get any signal in 'server' process.

> Am I doing something wrong?

> I have noticed that if I start 4 client processes with single mpiexec command, and one client gets killed, rest of the 3 clients receive SIGUSR1.

>

> Does this mean, SIGUSR1 is not forwarded across processes connected using inter-communicator?

>

>

> Thanks,

> Hirak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141009/e8a78262/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list