<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerqa008d5 { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonqa008d5:hover, .pfptPrimaryButtonqa008d5:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonqa008d5:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
</head><!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Since there is no update to this after posting the Debug messages, I have since got a new suggestion from someone. I was told that MPICH was only built with PMIx support whereas mpiexec. hydra only provides a PMI server. Could that be the source
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerqa008d5" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerqa008d5" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerqa008d5" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerqa008d5" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<div dir="ltr">Since there is no update to this after posting the Debug messages, I have since got a new suggestion from someone.<div>I was told that MPICH was only built with PMIx support whereas <code style="margin:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit;color:rgb(12,13,14)">mpiexec</code><code style="margin:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit;color:rgb(12,13,14);font-size:15px"><font face="-apple-system, BlinkMacSystemFont, Segoe UI Adjusted, Segoe UI, Liberation Sans, sans-serif">.</font></code><code style="margin:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit;color:rgb(12,13,14)">hydra</code><span style="color:rgb(12,13,14);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:15px"> only provides a </span><code style="margin:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit;color:rgb(12,13,14)">PMI</code><span style="color:rgb(12,13,14);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:15px"> server. </span></div><div><span style="color:rgb(12,13,14);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:15px">Could that be the source of the problem? </span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 29 Jul 2024 at 11:15, Stephen Wong <<a href="mailto:stephen.photond@gmail.com">stephen.photond@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">For the build configured with --with-device=ch4:ofi, I got<div><br></div><div>==== Various sizes and limits ====<br>sizeof(MPIDI_per_vci_t): 192<br>Required minimum FI_VERSION: 0, current version: 10014<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp;ofi_rxm, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: udp;ofi_rxd, score = 5, pref = -2, FI_SOCKADDR_IN6 [28] ::1<br>provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://4595<br>provider: shm, score = 4, pref = -2, FI_ADDR_STR [14] - fi_shm://4595<br>provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: udp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp, score = 0, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: tcp, score = 4, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: sockets, score = 3, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 192.168.1.5<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] fe80::1ff:fe23:4567:890a<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN [16] 127.0.0.1<br>provider: sockets, score = 5, pref = 0, FI_SOCKADDR_IN6 [28] ::1<br>provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://4595<br>provider: sm2, score = 3, pref = 0, FI_ADDR_STR [13] - fi_sm2://4595<br>Required minimum FI_VERSION: 10005, current version: 10014<br>==== Capability set configuration ====<br>libfabric provider: sockets - <a href="https://urldefense.us/v3/__http://192.168.1.0/24__;!!G_uCfscf7eWS!YoYuCo7yXJalnISDYi0_5YXgzfL0TnedGPdZZ49grmL2lrmknT4jd4tke0iU3MdmHb0yfQvulnhD0ZzbeWezpyXy$" target="_blank">192.168.1.0/24</a><br>MPIDI_OFI_ENABLE_DATA: 1<br>MPIDI_OFI_ENABLE_AV_TABLE: 1<br>MPIDI_OFI_ENABLE_SCALABLE_ENDPOINTS: 1<br>MPIDI_OFI_ENABLE_SHARED_CONTEXTS: 0<br>MPIDI_OFI_ENABLE_MR_VIRT_ADDRESS: 0<br>MPIDI_OFI_ENABLE_MR_ALLOCATED: 0<br>MPIDI_OFI_ENABLE_MR_REGISTER_NULL: 1<br>MPIDI_OFI_ENABLE_MR_PROV_KEY: 0<br>MPIDI_OFI_ENABLE_TAGGED: 1<br>MPIDI_OFI_ENABLE_AM: 1<br>MPIDI_OFI_ENABLE_RMA: 1<br>MPIDI_OFI_ENABLE_ATOMICS: 1<br>MPIDI_OFI_FETCH_ATOMIC_IOVECS: 1<br>MPIDI_OFI_ENABLE_DATA_AUTO_PROGRESS: 0<br>MPIDI_OFI_ENABLE_CONTROL_AUTO_PROGRESS: 0<br>MPIDI_OFI_ENABLE_PT2PT_NOPACK: 1<br>MPIDI_OFI_ENABLE_TRIGGERED: 0<br>MPIDI_OFI_ENABLE_HMEM: 0<br>MPIDI_OFI_NUM_AM_BUFFERS: 8<br>MPIDI_OFI_NUM_OPTIMIZED_MEMORY_REGIONS: 0<br>MPIDI_OFI_CONTEXT_BITS: 20<br>MPIDI_OFI_SOURCE_BITS: 0<br>MPIDI_OFI_TAG_BITS: 31<br>MPIDI_OFI_VNI_USE_DOMAIN: 1<br>MAXIMUM SUPPORTED RANKS: 4294967296<br>MAXIMUM TAG: 2147483648<br>==== Provider global thresholds ====<br>max_buffered_send: 255<br>max_buffered_write: 255<br>max_msg_size: 9223372036854775807<br>max_order_raw: -1<br>max_order_war: -1<br>max_order_waw: -1<br>tx_iov_limit: 8<br>rx_iov_limit: 8<br>rma_iov_limit: 8<br>max_mr_key_size: 8<br>==== Various sizes and limits ====<br>MPIDI_OFI_AM_MSG_HEADER_SIZE: 24<br>MPIDI_OFI_MAX_AM_HDR_SIZE: 255<br>sizeof(MPIDI_OFI_am_request_header_t): 416<br>sizeof(MPIDI_OFI_per_vci_t): 52480<br>MPIDI_OFI_AM_HDR_POOL_CELL_SIZE: 1024<br>MPIDI_OFI_DEFAULT_SHORT_SEND_SIZE: 16384<br>==== OFI dynamic settings ====<br>num_vcis: 1<br>num_nics: 1<br>======================================<br><br><em style="color:rgb(12,13,14);font-family:inherit;font-size:15px;font-variant:inherit;font-weight:inherit;margin:0px;padding:0px;border:0px;font-stretch:inherit;line-height:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit">mpiexec -host host1,host2 -n 2 cpi</em><br><br>Abort(883025295) on node 1: Fatal error in internal_Init: Other MPI error, error stack:<br>internal_Init(48306).............: MPI_Init(argc=0x7ffcb8c7aedc, argv=0x7ffcb8c7aed0) failed<br>MPII_Init_thread(265)............: <br>MPIR_init_comm_world(34).........: <br>MPIR_Comm_commit(823)............: <br>MPID_Comm_commit_post_hook(222)..: <br>MPIDI_world_post_init(660).......: <br>MPIDI_OFI_init_vcis(842).........: <br>check_num_nics(891)..............: <br>MPIR_Allreduce_allcomm_auto(4726): <br>MPIC_Sendrecv(306)...............: <br>MPIC_Wait(91)....................: <br>MPIR_Wait(780)...................: <br>MPIR_Wait_state(737).............: <br>MPIDI_progress_test(134).........: <br>MPIDI_OFI_handle_cq_error(791)...: OFI poll failed (ofi_events.c:793:MPIDI_OFI_handle_cq_error:Input/output error)<br>Abort(883025295) on node 0: Fatal error in internal_Init: Other MPI error, error stack:<br>internal_Init(48306).............: MPI_Init(argc=0x7fff18beee1c, argv=0x7fff18beee10) failed<br>MPII_Init_thread(265)............: <br>MPIR_init_comm_world(34).........: <br>MPIR_Comm_commit(823)............: <br>MPID_Comm_commit_post_hook(222)..: <br>MPIDI_world_post_init(660).......: <br>MPIDI_OFI_init_vcis(842).........: <br>check_num_nics(891)..............: <br>MPIR_Allreduce_allcomm_auto(4726): <br>MPIC_Sendrecv(306)...............: <br>MPIC_Wait(91)....................: <br>MPIR_Wait(780)...................: <br>MPIR_Wait_state(737).............: <br>MPIDI_progress_test(134).........: <br>MPIDI_OFI_handle_cq_error(791)...: OFI poll failed (ofi_events.c:793:MPIDI_OFI_handle_cq_error:Input/output error)<br><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 26 Jul 2024 at 19:04, Zhou, Hui <<a href="mailto:zhouh@anl.gov" target="_blank">zhouh@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div dir="ltr">
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Could you run cpi with <code>MPIR_CVAR_DEBUG_SUMMARY=1</code> and post the output?</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hui Zhou</div>
<div id="m_-2632354562306744420m_6868494427558226228appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-2632354562306744420m_6868494427558226228divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Stephen Wong via discuss <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
<b>Sent:</b> Friday, July 26, 2024 4:46 AM<br>
<b>To:</b> <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
<b>Cc:</b> Stephen Wong <<a href="mailto:stephen.photond@gmail.com" target="_blank">stephen.photond@gmail.com</a>><br>
<b>Subject:</b> [mpich-discuss] OFI poll failed error if using more than one cluster node</font>
<div> </div>
</div>
<div>
<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">
Hi, (I sent this previously without a subject line. ) I am using MPICH 4. 2. 2 on Ubuntu 24. 04 testing with the small program cpi that calculates the value of pi using MPI. I can start on host1 to run cpi on either host1 or host2 alone and I can
</div>
<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">
ZjQcmQRYFpfptBannerStart</div>
<div dir="ltr" id="m_-2632354562306744420m_6868494427558226228x_pfptBannerb3ol143" style="display:block;text-align:left;margin:16px 0px;padding:8px 16px;border-radius:4px;min-width:200px;background-color:rgb(208,216,220);border-top:4px solid rgb(144,164,174)">
<div id="m_-2632354562306744420m_6868494427558226228x_pfptBannerb3ol143" style="float:left;display:block;margin:0px 0px 1px;max-width:600px">
<div id="m_-2632354562306744420m_6868494427558226228x_pfptBannerb3ol143" style="display:block;background-color:rgb(208,216,220);color:rgb(0,0,0);font-family:Arial,sans-serif;font-weight:bold;font-size:14px;line-height:18px">
This Message Is From an External Sender </div>
<div id="m_-2632354562306744420m_6868494427558226228x_pfptBannerb3ol143" style="font-weight:normal;display:block;background-color:rgb(208,216,220);color:rgb(0,0,0);font-family:Arial,sans-serif;font-size:12px;line-height:18px;margin-top:2px">
This message came from outside your organization. </div>
</div>
<div style="height:0px;clear:both;display:block;line-height:0;font-size:0.01px">
</div>
</div>
<div style="font-size:1px;color:rgb(255,255,255);line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;display:none">
ZjQcmQRYFpfptBannerEnd</div>
<div dir="ltr">
<div>Hi,</div>
<div><br>
</div>
<div>(I sent this previously without a subject line.)</div>
<div><br>
</div>
<div><span style="color:rgb(12,13,14);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:15px">I am using MPICH 4.2.2 on Ubuntu 24.04 testing with the small program cpi that calculates the value
of pi using MPI. I can start on host1 to run cpi on either host1 or host2 alone and I can start on host2 to run cpi on either host2 or host1 alone. The problem occurs only if I try to use both host1 and host2 together.</span><br>
</div>
<div><span style="color:rgb(12,13,14);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:15px"><br>
</span></div>
<div>
<p style="margin:0px 0px 1.1em;padding:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-kerning:inherit;font-feature-settings:inherit;font-size:15px;vertical-align:baseline;box-sizing:inherit;clear:both;color:rgb(12,13,14)">
This is done using, for example, the command<br style="box-sizing:inherit">
<em style="margin:0px;padding:0px;border:0px;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;font-kerning:inherit;font-feature-settings:inherit;vertical-align:baseline;box-sizing:inherit">mpiexec
-host host1,host2 -n 2 cpi</em><br style="box-sizing:inherit">
then it ends with the error</p>
<p style="margin:0px 0px 1.1em;padding:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-variant-alternates:inherit;font-stretch:inherit;line-height:inherit;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-kerning:inherit;font-feature-settings:inherit;font-size:15px;vertical-align:baseline;box-sizing:inherit;clear:both;color:rgb(12,13,14)">
Abort(77718927) on node 1: Fatal error in internal_Init: Other MPI error, error stack:<br style="box-sizing:inherit">
internal_Init(48306).............: MPI_Init(argc=0x7ffdb68e7fec, argv=0x7ffdb68e7fe0) failed<br style="box-sizing:inherit">
MPII_Init_thread(265)............:<br style="box-sizing:inherit">
MPIR_init_comm_world(34).........:<br style="box-sizing:inherit">
MPIR_Comm_commit(823)............:<br style="box-sizing:inherit">
MPID_Comm_commit_post_hook(222)..:<br style="box-sizing:inherit">
MPIDI_world_post_init(660).......:<br style="box-sizing:inherit">
MPIDI_OFI_init_vcis(842).........:<br style="box-sizing:inherit">
check_num_nics(891)..............:<br style="box-sizing:inherit">
MPIR_Allreduce_allcomm_auto(4726):<br style="box-sizing:inherit">
MPIC_Sendrecv(306)...............:<br style="box-sizing:inherit">
MPIC_Wait(91)....................:<br style="box-sizing:inherit">
MPIR_Wait(780)...................:<br style="box-sizing:inherit">
MPIR_Wait_state(737).............:<br style="box-sizing:inherit">
MPIDI_progress_test(134).........:<br style="box-sizing:inherit">
MPIDI_OFI_handle_cq_error(791)...: OFI poll failed (ofi_events.c:793:MPIDI_OFI_handle_cq_error:Input/output error) Abort(77718927) on node 0: Fatal error in internal_Init: Other MPI error, error stack:<br style="box-sizing:inherit">
internal_Init(48306).............: MPI_Init(argc=0x7ffcb5b28adc, argv=0x7ffcb5b28ad0) failed<br style="box-sizing:inherit">
MPII_Init_thread(265)............:<br style="box-sizing:inherit">
MPIR_init_comm_world(34).........:<br style="box-sizing:inherit">
MPIR_Comm_commit(823)............:<br style="box-sizing:inherit">
MPID_Comm_commit_post_hook(222)..:<br style="box-sizing:inherit">
MPIDI_world_post_init(660).......:<br style="box-sizing:inherit">
MPIDI_OFI_init_vcis(842).........:<br style="box-sizing:inherit">
check_num_nics(891)..............:<br style="box-sizing:inherit">
MPIR_Allreduce_allcomm_auto(4726):<br style="box-sizing:inherit">
MPIC_Sendrecv(306)...............:<br style="box-sizing:inherit">
MPIC_Wait(91)....................:<br style="box-sizing:inherit">
MPIR_Wait(780)...................:<br style="box-sizing:inherit">
MPIR_Wait_state(737).............:<br style="box-sizing:inherit">
MPIDI_progress_test(134).........:<br style="box-sizing:inherit">
MPIDI_OFI_handle_cq_error(791)...: OFI poll failed (ofi_events.c:793:MPIDI_OFI_handle_cq_error:Input/output error)</p>
I searched through the archive of this mailing list and there is only one thread that has this OFI poll failed error. </div>
<div>In the thread, it mentioned this has something to do with the device configuration of ch4:ofi ?</div>
<div>I checked my configure log and it has </div>
<div>device : ch4:ofi (embedded libfabric) </div>
<div>in the configuration when I built the MPI. So I am wondering if I should switch this option to something else? If this will fix it. I am not too sure what other option I could substitute for ch4:ofi.</div>
<div><br>
</div>
<div>*****************************************************</div>
<div><br>
</div>
<div>Next I tried running configure for a build with the --enable-device = ch3:nemesis option. </div>
<div>Now again I can run the cpi on any of host1 or host2 alone. If I run it on host1 and host2 together, it just crashed with a core dump.</div>
<div><br>
</div>
<div>Using the --enable-device = ch3:sock configure option resulted in more or less the same problem but now it just quits silently when running on host1 and host2 together.</div>
<div><br>
</div>
<div>Any ideas?</div>
<div>Thanks!</div>
<font color="#888888">
<div>Stephen.</div>
<div><br>
</div>
</font></div>
</div>
</div>
</div></blockquote></div>
</blockquote></div>