<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi Patrick,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for referring the `mpi4pi` issue. I am cross linking it to
<a href="https://github.com/pmodels/mpich/issues/4130">https://github.com/pmodels/mpich/issues/4130</a>. The problem is how FORTRAN common blocks works, or how the way it works in a non-standard way. When libmpifort.so is linked in later, it creates separate
common blocks that are disconnected from the previous one. This is probably due to the fact that MPICH still embraces the F77 compatibility. So rather than saying this is not an MPICH issue, it is an issue that there isn’t simple fix currently.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Patrick McNally via discuss <discuss@mpich.org><br>
<b>Reply-To: </b>"discuss@mpich.org" <discuss@mpich.org><br>
<b>Date: </b>Thursday, July 2, 2020 at 5:34 AM<br>
<b>To: </b>"discuss@mpich.org" <discuss@mpich.org><br>
<b>Cc: </b>Patrick McNally <rpmcnally@gmail.com><br>
<b>Subject: </b>Re: [mpich-discuss] Using MPICH in Python breaks Fortran MPI_IN_PLACE<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thank you both for the follow up. I did engage with Lisandro first, which you can find here: <a href="https://bitbucket.org/mpi4py/mpi4py/issues/162/mpi4py-initialization-breaks-fortran">https://bitbucket.org/mpi4py/mpi4py/issues/162/mpi4py-initialization-breaks-fortran</a>.
The short version is that this isn't an mpi4py issue either. For my particular case, mpi4py could possibly fix it by linking to the Fortran bindings even though it doesn't need or use them, but I think that focusing on mpi4py misses the larger point.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">As the standalone test files he provided show, this issue can definitely happen without mpi4py in the loop. It would appear that using any 3rd party Python/C library that uses MPI (or custom Python bindings to the MPI C library) would
cause later invocation of Fortran code to behave incorrectly. Having the Fortran accessible C wrappers testing the buffer against the address of a variable defined in another library seems odd to me, but I don't understand your codebase well enough to propose
an alternative. I would say that even if this is a problem that the end user needs to fix, it would be nice if there were some user-facing documentation to explain and provide a remedy for the problem.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">-Patrick<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Jul 1, 2020 at 8:40 PM Jeff Hammond via discuss <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">Has Lisandro provided any feedback on this anywhere? I agree with Hui. This appears to be an artifact of the way MPI uses magic values for MPI_IN_PLACE and how those get encoded in binaries. I naively speculate that MPICH could encode
the magic values differently to work around this, but that would likely break ABI compatibility, at least on the Fortran side.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Jeff<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Wed, Jun 10, 2020 at 3:31 PM Zhou, Hui via discuss <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I am not sure this is an issue for mpich. Seems more to be an issue of `mpi4py`. The issue is exactly as you suspected -- `libmpifort.so` need be loaded before `libmpi.so`, or the
external symbol in `libmpifort.so` won’t get resolved. <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">> It works if you load something linked to libmpifort first or load everything with RTLD_GLOBAL. <br>
<br>
Seems plausible. I don’t have any better idea other than fixing `mpi4py` so it always load `libmpifort.so` first.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Patrick McNally via discuss <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
<b>Reply-To: </b>"<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>" <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
<b>Date: </b>Wednesday, June 10, 2020 at 1:09 PM<br>
<b>To: </b>"<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>" <<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
<b>Cc: </b>Patrick McNally <<a href="mailto:rpmcnally@gmail.com" target="_blank">rpmcnally@gmail.com</a>><br>
<b>Subject: </b>Re: [mpich-discuss] Using MPICH in Python breaks Fortran MPI_IN_PLACE</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">I hate to bug, but this is a pretty serious issue. I suspect it is why we get segfaults trying to use similar variables like MPI_STATUSES_IGNORE. Any insight would be appreciated.<o:p></o:p></p>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Patrick<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">On Wed, May 27, 2020 at 10:25 AM Patrick McNally <<a href="mailto:rpmcnally@gmail.com" target="_blank">rpmcnally@gmail.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Our application consists primarily of a Python head calling into Fortran routines to do the heavy lifting. We have never been able to successfully use MPI_IN_PLACE in Fortran but
weren't sure why. Recently, we discovered that it works fine in standalone Fortran code and is only broken when the Fortran code is run through our Python modules.<br>
<br>
The issue appears to be related to having code that only links to the C libmpi library loaded first and with RTLD_LOCAL, as happens when we load mpi4py. It works if you load something linked to libmpifort first or load everything with RTLD_GLOBAL. I'm assuming
this has something to do with how MPICH tests the address of MPIR_F08_MPI_IN_PLACE but I don't understand SO loading well enough to fully grasp the issue. Below is some standalone code to show the issue. I'd appreciate any insight you can provide into why
this is happening.<br>
<br>
Relevant system details:<br>
RHEL 7.8<br>
Python 2.7<br>
GCC 7.3.0<br>
MPICH 3.3.2 (and 3.2)<br>
<br>
The below files are also available towards the end of the bug report at the following link:<br>
<a href="https://bitbucket.org/mpi4py/mpi4py/issues/162/mpi4py-initialization-breaks-fortran" target="_blank">https://bitbucket.org/mpi4py/mpi4py/issues/162/mpi4py-initialization-breaks-fortran</a><o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Patrick<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><br>
makefile<br>
-----------<br>
libs = testc.so testf.so<br>
all: $(libs)<br>
<br>
testc.so: testc.c<br>
mpicc -shared -fPIC $< -o $@<br>
<br>
testf.so: testf.f90<br>
mpifort -shared -fPIC $< -o $@<br>
<br>
clean:<br>
$(RM) $(libs)<br>
<br>
testc.c<br>
---------<br>
#include <stddef.h><br>
#include <stdio.h><br>
#include <mpi.h><br>
<br>
extern void initc(void);<br>
extern void testc(void);<br>
<br>
void initc(void)<br>
{<br>
MPI_Init(NULL,NULL);<br>
}<br>
<br>
void testc(void)<br>
{<br>
int val = 1;<br>
MPI_Allreduce(MPI_IN_PLACE, &val, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);<br>
printf("C val: %2d\n",val);<br>
}<br>
<br>
testf.f90<br>
-----------<br>
subroutine initf() bind(C)<br>
use mpi<br>
integer ierr<br>
call MPI_Init(ierr)<br>
end subroutine initf<br>
<br>
subroutine testf() bind(C)<br>
use mpi<br>
integer ierr<br>
integer val<br>
val = 1<br>
call MPI_Allreduce(MPI_IN_PLACE, val, 1, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr)<br>
print '(A,I2)', 'F val: ', val<br>
end subroutine testf<br>
<br>
test.py<br>
---------<br>
from ctypes import CDLL, RTLD_LOCAL, RTLD_GLOBAL<br>
<br>
mode = RTLD_LOCAL<br>
cfirst = True<br>
<br>
if cfirst: # it does not work!<br>
libc = CDLL("./testc.so", mode)<br>
libf = CDLL("./testf.so", mode)<br>
else: # it works!<br>
libf = CDLL("./testf.so", mode)<br>
libc = CDLL("./testc.so", mode)<br>
<br>
libc.initc.restype = None<br>
libc.testc.argtypes = []<br>
libf.initf.restype = None<br>
libf.testf.argtypes = []<br>
<br>
libc.initc()<br>
libc.testc()<br>
libf.testf()<o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<p class="MsoNormal">_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><o:p></o:p></p>
</blockquote>
</div>
<p class="MsoNormal"><br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">-- <o:p></o:p></p>
<div>
<p class="MsoNormal">Jeff Hammond<br>
<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><o:p></o:p></p>
</div>
<p class="MsoNormal">_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</body>
</html>