<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannergjiqy2i { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtongjiqy2i:hover, .pfptPrimaryButtongjiqy2i:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtongjiqy2i:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
</head><!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hello, I recently encountered an unexpected behaviour of the MPI_Probe + MPI_Get_count functions under specific conditions. I was hoping that this forum could advise me on a solution. Specifically, the application performs an MPI_Send communication
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannergjiqy2i" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannergjiqy2i" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannergjiqy2i" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannergjiqy2i" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<div dir="ltr">Hello,<div><br></div><div>I recently encountered an unexpected behaviour of the MPI_Probe + MPI_Get_count functions under specific conditions. I was hoping that this forum could advise me on a solution.</div><div> <br>Specifically, the application performs an MPI_Send communication from the root process to process B. Process B doesn't know the size of the message, so I use MPI_Probe + MPI_Get_count to discover it. However, as an example, if the size of the message is 1000 bytes, process B expects with MPI_Get_count function a total of 20 bytes.<br></div><div><br></div><div>The problem only occurs with a specific installation of MPICH and when the following conditions are met in my code:<br>- The problem only occurs in internode communications.<br>- The problem only appears if derived types are used in the communication. Specifically a derived type to communicate a vector of integers and a vector of reals, both with the same number of elements.<br>- None of the MPI functions give an error code. They all return MPI_Sucess.<br>- If instead of allocating the amount of bytes returned by MPI_Get_count(=20), I allocate the expected value (1000), the message is received correctly.<br>- The size returned by MPI_Get_count seems to be variable depending on the total number of addresses with which the derived type is created.<br></div><div><br></div><div>I have attached the file to reproduce the problem. It can also be accessed via the GitLab link below:<br><a href="https://urldefense.us/v3/__https://lorca.act.uji.es/gitlab/martini/mpich_ofi_mpi_probe_bug__;!!G_uCfscf7eWS!aVgghOB04ZFIQ9sus7BHy-d5is_qeaeC4HHkojD2AKAz4SjExQRNGSl8AAyhk85tIb_jsqY189JmMw$">https://lorca.act.uji.es/gitlab/martini/mpich_ofi_mpi_probe_bug</a><br>It is designed to be run with 3 processes, two of them hosted on one node and the third on a different one.<br></div><div><br></div><div>As previously mentioned, this problem occurs when using MPICH with ch4:ofi without using the embedded option. Specifically, I have tested the following installations in which the error appears:<br></div><div>- MPICH 4.2.0 with config options: '--with-device=ch4:ofi' '--with-libfabric=/home/martini/Instalaciones/libfabric-1.16.1'<br></div><div>- MPICH 4.0.3 with config options: '--with-device=ch4:ofi' '--with-libfabric=/home/martini/Instalaciones/libfabric-1.16.1'</div><div>- MPICH 4.0.3 with config options: '--with-device=ch4:ofi' '--with-libfabric=/home/martini/Instalaciones/libfabric-1.16.1' '--disable-psm3'<br></div><div>- MPICH 3.4.1 with config options: '--with-device=ch4:ofi' '--with-libfabric=/home/martini/Instalaciones/libfabric-1.16.1'<br></div><div><br></div><div>However, it does work as expected for the following MPICH installations:</div><div>- MPICH 4.0.3 with config options: '--with-device=ch4:ofi' '--with-libfabric=embedded'<br></div><div>- MPICH 4.0.3 with config options: '--with-device=ch4:ucx' '--with-ucx=/soft/gnu/ucx-1.11'</div><div>- MPICH 3.4.1 with config options: '--with-device=ch4:ucx' '--with-ucx=/soft/gnu/ucx-1.11'</div><div><br></div><div>Although for these installations the code does work, we would like to use a different libfabric installation than the embedded one because we get better networking performance. In the case of UCX, it is because the application in question uses the MPI_Comm_spawn call and MPICH does not currently support it with UCX.</div><div><br></div><div>Thank you for your help.<br>Best regards,<br></div><div>Iker</div></div>