<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">The libfabric version is 1.12.1<div class=""><br class=""></div><div class="">Here is the log as asked:</div><div class=""><br class=""></div><div class=""><div class="">==== Capability set configuration ====</div><div class="">libfabric provider: udp;ofi_rxd</div><div class="">MPIDI_OFI_ENABLE_AV_TABLE: 1</div><div class="">MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 0</div><div class="">MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0</div><div class="">MPIDI_OFI_ENABLE_MR_SCALABLE: 0</div><div class="">MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0</div><div class="">MPIDI_OFI_ENABLE_MR_ALLOCATED: 0</div><div class="">MPIDI_OFI_ENABLE_MR_PROV_KEY: 0</div><div class="">MPIDI_OFI_ENABLE_TAGGED: 1</div><div class="">MPIDI_OFI_ENABLE_AM: 1</div><div class="">MPIDI_OFI_ENABLE_RMA: 1</div><div class="">MPIDI_OFI_ENABLE_ATOMICS: 1</div><div class="">MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1</div><div class="">MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0</div><div class="">MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0</div><div class="">MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1</div><div class="">MPIDI_OFI_ENABLE_HMEM: 0</div><div class="">MPIDI_OFI_NUM_AM_BUFFERS: 8</div><div class="">MPIDI_OFI_CONTEXT_BITS: 20</div><div class="">MPIDI_OFI_SOURCE_BITS: 0</div><div class="">MPIDI_OFI_TAG_BITS: 31</div><div class="">======================================</div><div class="">MAXIMUM SUPPORTED RANKS: 4294967296</div><div class="">MAXIMUM TAG: 2147483648</div><div class="">======================================</div><div class="">Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in MPI_Init: Other MPI error, error stack:</div><div class="">MPIR_Init_thread(152).......:</div><div class="">MPID_Init(597)..............:</div><div class="">MPIDI_OFI_mpi_init_hook(674):</div><div class="">create_vni_context(964).....: OFI resource bind failed (ofi_init.c:964:create_vni_context:No message available on STREAM)</div><div class="">…</div><div class=""><br class=""></div><div class="">Michka</div><div><br class=""><blockquote type="cite" class=""><div class="">On 4 May 2021, at 00:53, Zhou, Hui <<a href="mailto:zhouh@anl.gov" class="">zhouh@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class="">Hi Michka,</div><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class=""><br class=""></div><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class="">Which libfabric version are you using? Could you try setting `MPIR_CVAR_CH4_OFI_CAPABILITY_SETS_DEBUG=` to see if there is more debug messages?</div><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class=""><br class=""></div><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class="">--<span class="Apple-converted-space"> </span><br class=""></div><div style="font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;" class="">Hui Zhou<br class=""></div><div id="appendonsend" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div><hr tabindex="-1" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; display: inline-block; width: 1092.6875px;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class=""></span><div id="divRplyFwdMsg" dir="ltr" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><font face="Calibri, sans-serif" style="font-size: 11pt;" class=""><b class="">From:</b><span class="Apple-converted-space"> </span>Michka Popoff via discuss <<a href="mailto:discuss@mpich.org" class="">discuss@mpich.org</a>><br class=""><b class="">Sent:</b><span class="Apple-converted-space"> </span>Saturday, May 1, 2021 4:33 PM<br class=""><b class="">To:</b><span class="Apple-converted-space"> </span><a href="mailto:discuss@mpich.org" class="">discuss@mpich.org</a> <<a href="mailto:discuss@mpich.org" class="">discuss@mpich.org</a>><br class=""><b class="">Cc:</b><span class="Apple-converted-space"> </span>Michka Popoff <<a href="mailto:michkapopoff@gmail.com" class="">michkapopoff@gmail.com</a>><br class=""><b class="">Subject:</b><span class="Apple-converted-space"> </span>[mpich-discuss] Mpich failure with external libfabric on macOS: OFI resource bind failed</font><div class=""> </div></div><div class="" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; line-break: after-white-space;">Hi<div class=""><br class=""></div><div class="">Homebrew maintainer here (<a href="https://github.com/Homebrew" class="">https://github.com/Homebrew</a>).</div><div class="">Homebrew ships mpich as a package on both MacOS and Linux.</div><div class=""><br class=""></div><div class="">The issue below was found with version 3.4.1 but might have been there for longer.</div><div class=""><br class=""></div><span class="">We noticed that mpich was building it’s own internal libfabric dependency.<br class=""></span><span class="">After reading <a href="https://lists.mpich.org/pipermail/discuss/2021-January/006092.html" class="">https://lists.mpich.org/pipermail/discuss/2021-January/006092.html</a>,<br class=""></span><span class="">we added `--with-device=ch4:ofi` and set the libfabric path with the --with-libfabric= flag,<br class=""></span><span class="">to use our own version.<br class=""></span><span class=""><br class=""></span><div class=""><font class="">The build is fine. We have a small test to check if mpich is still working fine;</font></div><div class=""><font class=""><br class=""></font></div><div class=""><font class=""><div class="">#include <mpi.h></div><div class="">#include <stdio.h></div><div class="">int main()</div><div class="">{</div><div class=""> int size, rank, nameLen;</div><div class=""> char name[MPI_MAX_PROCESSOR_NAME];</div><div class=""> MPI_Init(NULL, NULL);</div><div class=""> MPI_Comm_size(MPI_COMM_WORLD, &size);</div><div class=""> MPI_Comm_rank(MPI_COMM_WORLD, &rank);</div><div class=""> MPI_Get_processor_name(name, &nameLen);</div><div class=""> printf("[%d/%d] Hello, world! My name is %s.\\n", rank, size, name);</div><div class=""> MPI_Finalize();</div><div class=""> return 0;</div><div class="">}</div><div class=""><br class=""></div><div class="">Executing the test fails with a weird error:</div><div class=""><br class=""></div><div class=""><div class="">/usr/local/Cellar/mpich/3.4.1_2/bin/mpicc hello.c -o hello</div><div class="">./hello</div><div class="">Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in MPI_Init: Other MPI error, error stack:</div><div class="">MPIR_Init_thread(152).......: </div><div class="">MPID_Init(597)..............: </div><div class="">MPIDI_OFI_mpi_init_hook(674): </div><div class="">create_vni_context(964).....: OFI resource bind failed (ofi_init.c:964:create_vni_context:No message available on STREAM)</div><div class="">[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1615247</div><div class="">:</div><div class="">system msg for write_line failure : Bad file descriptor</div><div class="">Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in MPI_Init: Other MPI error, error stack:</div><div class="">MPIR_Init_thread(152).......: </div><div class="">MPID_Init(597)..............: </div><div class="">MPIDI_OFI_mpi_init_hook(674): </div><div class="">create_vni_context(964).....: OFI resource bind failed (ofi_init.c:964:create_vni_context:No message available on STREAM)</div><div class="">[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1615247</div><div class="">:</div><div class="">system msg for write_line failure : Bad file descriptor</div></div><div class=""><br class=""></div><div class="">This test passes fine on Linux, and fails only on MacOS. Using the internal libfabric is fine on both platforms.</div><div class=""><br class=""></div><div class="">Here is the related discussion:</div><div class=""><a href="https://github.com/Homebrew/homebrew-core/pull/73062" class="">https://github.com/Homebrew/homebrew-core/pull/73062</a></div><div class=""><br class=""></div><div class="">Maybe you could help us debug this issue?</div><div class=""><br class=""></div><div class="">Regards</div><div class=""><br class=""></div><div class="">Michka</div></font></div></div></div></blockquote></div><br class=""></div></body></html>