[mpich-discuss] MPI process killed and SIGUSR1

Roy, Hirak Hirak_Roy at mentor.com
Thu Oct 9 10:57:21 CDT 2014


Hi Sangmin,

The readme of mpich says the following :

FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL
ALMOST CERTAINLY CHANGE IN THE FUTURE!

   In the current release, hydra notifies the MPICH library of failed
   processes by sending a SIGUSR1 signal.  The application can catch
   this signal to be notified of failed processes.  If the application
   replaces the library's signal handler with its own, the application
   must be sure to call the library's handler from it's own
   handler.  Note that you cannot call any MPI function from inside a
   signal handler.

If this is true, should not I expect SIGUSR1?


Thanks,
Hirak
________________________________

First of all, MPI functions are not signal safe. So, if you try to use signals within your MPI program, things might break.



- Sangmin





On Oct 9, 2014, at 7:37 AM, Roy, Hirak <Hirak_Roy at mentor.com<https://lists.mpich.org/mailman/listinfo/discuss><mailto:Hirak_Roy at mentor.com<https://lists.mpich.org/mailman/listinfo/discuss>>> wrote:



Hi ,



I have two MPI processes (server and client)  launched independently by two different mpiexec command. (mpich-3.0.4, sock-device)

1>    mpiexec -disable-auto-cleanup -n 1 ./server

2>    mpiexec -disable-auto-cleanup -n 1 ./client



The server opens a port and does MPI_Comm_accept.

The client gets the port information and does MPI_Comm_connect and hence we get a new intercommunicator.

I don't do MPI_Comm_merge.



I have installed my own signal handler for SIGUSR1 before even I call MPI_Init ( I guess, this will automatically chain the signal handler).



>> signal (SIGUSR1, mysignalhandler);



Now suppose, the 'client' process gets killed ( I forcefully kill the process by signal 9), I thought I would get SIGUSR1 in the process 'server'.

However, I don't get any signal in 'server' process.

Am I doing something wrong?

I have noticed that if I start 4 client processes with single mpiexec command, and one client gets killed, rest of the 3 clients receive SIGUSR1.



Does this mean, SIGUSR1 is not forwarded across processes connected using inter-communicator?





Thanks,

Hirak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141009/de9babec/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list