<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerm8me4xj { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonm8me4xj:hover, .pfptPrimaryButtonm8me4xj:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonm8me4xj:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
</head><!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hello, Lately, I have encountered an issue with the usage of MPI_Comm_spawn + MPI_Intercomm_merge, where in some occasions my application hangs indefinitely. I was hoping that this forum could advise me on a solution. For my tests, I run the
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerm8me4xj" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerm8me4xj" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerm8me4xj" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerm8me4xj" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<div dir="ltr">Hello,<div><br></div><div>Lately, I have encountered an issue with the usage of MPI_Comm_spawn + MPI_Intercomm_merge, where in some occasions my application hangs indefinitely. I was hoping that this forum could advise me on a solution.<br><div><br></div><div><div>For my tests, I run the code with 2 processes and give as an argument the amount of processes to spawn in MPI_Comm_spawn, where I choose 20. In my experimentation, 3 to 4 out of 100 executions hang indefinitely at the function MPI_Barrier. Nevertheless, if I remove both MPI_Intercomm_merge functions, the code always finalizes.</div><div><br></div><div><div>Also, prior to executing the code I set the environment variable FI_PROVIDER=verbs.</div><div>However, if I change the provider to "tcp", or "udp", the code doesn't hang, but the performance is lower for other providers different to "verbs".</div></div><div><br></div><div>The MPICH version used is 4.2.1 with the following configure line:</div><div>./configure --prefix=... --with-device=ch4:ofi --disable-psm3</div><div><br></div><div>In addition, I have attached a <a href="https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_mpi_intercomm_merge_bug/-/blob/master/log-providers.txt__;!!G_uCfscf7eWS!fbrsIVjsyUNeOENsQntqYTQR8AN6gWXHB_RXhprRM6knmxEqBKQzmHG5XiAj_Fu7WZ1bk9EhFZQ4VQ$">file</a> with the output of executing mpirun with "MPIR_CVAR_DEBUG_SUMMARY=1" and unsetting FI_PROVIDER.</div></div><div><br></div><div>The <a href="https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_mpi_intercomm_merge_bug/-/blob/master/BaseCode.c__;!!G_uCfscf7eWS!fbrsIVjsyUNeOENsQntqYTQR8AN6gWXHB_RXhprRM6knmxEqBKQzmHG5XiAj_Fu7WZ1bk9G9fhVjRg$">minimal code</a> to reproduce the problem is the following:</div><div>===============</div><div>#include <stdio.h><br>#include <stdlib.h><br>#include <mpi.h><br><br>int main(int argc, char* argv[]) {<br> MPI_Init(&argc, &argv);<br> <br> MPI_Comm spawn, new_comm;<br> MPI_Comm_get_parent(&spawn);<br> if(spawn == MPI_COMM_NULL) {<br> int num_c = atoi(argv[1]);<br> MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, num_c, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &spawn, MPI_ERRCODES_IGNORE);<br> MPI_Intercomm_merge(spawn, 0, &new_comm);<br> } else {<br> MPI_Intercomm_merge(spawn, 1, &new_comm);<br> MPI_Barrier(MPI_COMM_WORLD);<br> }<br> MPI_Finalize(); <br>}<br></div><div>===============</div><div><br></div><div><div>Thank you for your help.<br>Best regards,<br></div><div>Iker</div></div><div><br></div></div></div>