<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerukxck0k { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonukxck0k:hover, .pfptPrimaryButtonukxck0k:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonukxck0k:active {
background-color: #90a4ae !important; }
html:root, html:root>body { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
</head><!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
I've got several MPI programs here. The one which is the most complicated started exiting, reporting that a process got signal 9 while cleaning up after a run it reported was successful. Many of the other MPI processes showed truncated</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerukxck0k" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerukxck0k" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerukxck0k" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerukxck0k" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<div dir="ltr"><div>I've got several MPI programs here. The one which is the most complicated started exiting, reporting that a process got signal 9 while cleaning up after a run it reported was successful. Many of the other MPI processes showed truncated outputs as if they too had received a signal 9. Only that one program has this problem, the other programs don't. I tried reducing the big program to a small testcase which reproduces the issue but was unsuccessful.</div><div><br></div><div>I did put a gdb onto the hydra_pmi_proxy and discovered that it is the process sending the signal 9 to the various MPI processes,</div><div><br></div><div>(gdb) where<br>#0 0x00007f4b17853d7e in killpg () from /lib64/libc.so.6<br>#1 0x00000000004053e2 in PMIP_bcast_signal (sig=sig@entry=9) at proxy/pmip_pg.c:259<br>#2 0x0000000000406e60 in pmi_cb (fd=9, events=<optimized out>, userp=<optimized out>)<br> at proxy/pmip_cb.c:326<br>#3 0x0000000000421418 in HYDT_dmxu_poll_wait_for_event (wtime=<optimized out>)<br> at lib/tools/demux/demux_poll.c:75<br>#4 0x0000000000403ff5 in main (argc=<optimized out>, argv=<optimized out>) at proxy/pmip.c:121<br><br></div><div>At that time I was using mpich 4.3.0, so I upgraded to 5.0.0 hoping the problem would be resolved. 5.0.0 still showed the same symptom. This all is happening on SUSE Linux 15.5.</div><div><br></div><div>On CentOS7 and Rocky Linux 9 we use mvapich2 2.3.6, so for an experiment I took the mpirun and hydra_pmi_proxy from 2.3.6 and used them instead of the versions from the mpich 5.0.0 release. Now the program works without difficulty. All of this suggests to me that the hydra_pmi_proxy has incorrectly determined that one of the MPI processes exited with a signal. Any suggestions about what's going on?</div><div><br></div><div><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><br>.. Lana (<a href="mailto:lana.deere@gmail.com" target="_blank">lana.deere@gmail.com</a>)<br><br><br></div></div></div>