[mpich-discuss] Using MPICH in Python breaks Fortran MPI_IN_PLACE

Zhou, Hui zhouh at anl.gov
Wed Jun 10 17:31:41 CDT 2020


I am not sure this is an issue for mpich. Seems more to be an issue of `mpi4py`. The issue is exactly as you suspected -- `libmpifort.so` need be loaded before `libmpi.so`, or the external symbol in `libmpifort.so` won’t get resolved.

>  It works if you load something linked to libmpifort first or load everything with RTLD_GLOBAL.

Seems plausible. I don’t have any better idea other than fixing `mpi4py` so it always load `libmpifort.so`  first.

--
Hui Zhou


From: Patrick McNally via discuss <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Wednesday, June 10, 2020 at 1:09 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: Patrick McNally <rpmcnally at gmail.com>
Subject: Re: [mpich-discuss] Using MPICH in Python breaks Fortran MPI_IN_PLACE

I hate to bug, but this is a pretty serious issue.  I suspect it is why we get segfaults trying to use similar variables like MPI_STATUSES_IGNORE.  Any insight would be appreciated.

Thanks,
Patrick

On Wed, May 27, 2020 at 10:25 AM Patrick McNally <rpmcnally at gmail.com<mailto:rpmcnally at gmail.com>> wrote:
Our application consists primarily of a Python head calling into Fortran routines to do the heavy lifting.  We have never been able to successfully use MPI_IN_PLACE in Fortran but weren't sure why.  Recently, we discovered that it works fine in standalone Fortran code and is only broken when the Fortran code is run through our Python modules.

The issue appears to be related to having code that only links to the C libmpi library loaded first and with RTLD_LOCAL, as happens when we load mpi4py.  It works if you load something linked to libmpifort first or load everything with RTLD_GLOBAL.  I'm assuming this has something to do with how MPICH tests the address of MPIR_F08_MPI_IN_PLACE but I don't understand SO loading well enough to fully grasp the issue.  Below is some standalone code to show the issue.  I'd appreciate any insight you can provide into why this is happening.

Relevant system details:
RHEL 7.8
Python 2.7
GCC 7.3.0
MPICH 3.3.2 (and 3.2)

The below files are also available towards the end of the bug report at the following link:
https://bitbucket.org/mpi4py/mpi4py/issues/162/mpi4py-initialization-breaks-fortran

Thanks,
Patrick

makefile
-----------
libs = testc.so testf.so
all: $(libs)

testc.so: testc.c
        mpicc   -shared -fPIC $< -o $@

testf.so: testf.f90
        mpifort -shared -fPIC $< -o $@

clean:
        $(RM) $(libs)

testc.c
---------
#include <stddef.h>
#include <stdio.h>
#include <mpi.h>

extern void initc(void);
extern void testc(void);

void initc(void)
{
  MPI_Init(NULL,NULL);
}

void testc(void)
{
  int val = 1;
  MPI_Allreduce(MPI_IN_PLACE, &val, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
  printf("C val: %2d\n",val);
}

testf.f90
-----------
subroutine initf() bind(C)
  use mpi
  integer ierr
  call MPI_Init(ierr)
end subroutine initf

subroutine testf() bind(C)
  use mpi
  integer ierr
  integer val
  val = 1
  call MPI_Allreduce(MPI_IN_PLACE, val, 1, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr)
  print '(A,I2)', 'F val: ', val
end subroutine testf

test.py
---------
from ctypes import CDLL, RTLD_LOCAL, RTLD_GLOBAL

mode = RTLD_LOCAL
cfirst = True

if cfirst: # it does not work!
    libc = CDLL("./testc.so", mode)
    libf = CDLL("./testf.so", mode)
else: # it works!
    libf = CDLL("./testf.so", mode)
    libc = CDLL("./testc.so", mode)

libc.initc.restype  = None
libc.testc.argtypes = []
libf.initf.restype  = None
libf.testf.argtypes = []

libc.initc()
libc.testc()
libf.testf()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200610/07391a74/attachment-0001.html>


More information about the discuss mailing list