[mpich-discuss] MPI Window limit on mpich

Marvin Smith Marvin.Smith at sncorp.com
Mon Sep 19 12:49:51 CDT 2016


Good morning, 

    I wanted to present an issue I am having with MPICH and validate 
whether this is a configuration problem, a limitation with MPICH, or a 
bug. 

I am writing an application which uses a large number of MPI windows, each 
window is given a relatively large amount of memory.   This has never been 
a problem before, however we discovered if you allocate more than 2045 
windows, you get an exception thrown. 

Notes:
I am compiling using g++, version 4.8.5  on Red Hat Enterprise Linux 
version 7.2. 
My MPICH version is listed at the bottom of this email.  It was installed 
via yum and is the RHEL default.
I have attached sample output, to include the stdout/stderr.  Also 
included is a Makefile and a simple example.
The boundary of failure is between 2045 and 2046 windows.
I have verified on my system this problem repeats even if I distribute 
windows between multiple communicators.
I have not tested yet against ompi or mvapich. 


#-----------------------------------------------------------------------------------------------#
#-                                      Here is my output       -#
#-----------------------------------------------------------------------------------------------#

mpirun -np 2 -hosts localhost ./mpi-win-test 2046 1
Initialized Rank: 0, Number Processors: 2, Hostname: test-machine
Initialized Rank: 1, Number Processors: 2, Hostname: test-machine
Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
MPI_Win_create_dynamic(154)..........: 
MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, win=0x10c1464) 
failed
MPID_Win_create_dynamic(139).........: 
win_init(254)........................: 
MPIR_Comm_dup_impl(55)...............: 
MPIR_Comm_copy(1552).................: 
MPIR_Get_contextid(799)..............: 
MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because 
of fragmentation (0/2048 free on this process; ignore_id=0)
Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
MPI_Win_create_dynamic(154)..........: 
MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, win=0x19ef444) 
failed
MPID_Win_create_dynamic(139).........: 
win_init(254)........................: 
MPIR_Comm_dup_impl(55)...............: 
MPIR_Comm_copy(1552).................: 
MPIR_Get_contextid(799)..............: 
MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because 
of fragmentation (0/2048 free on this process; ignore_id=0)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
make: *** [run] Error 1



#------------------------------------------------------------------------------------------#
#-                  Here is my Sample Makefile -#
#------------------------------------------------------------------------------------------#

#  Path to mpich on RHEL7
MPI_INCL=-I/usr/include/mpich-x86_64
MPI_LIBS=-L/usr/lib64/mpich/lib -lmpich

#  C++11 Bindings (Being lazy with string)
CXX_ARGS=-std=c++11

#  Make th test
all: mpi-win-test

mpi-win-test: mpi-win-test.cpp
        g++ $< -o $@ $(MPI_INCL) $(MPI_LIBS) $(CXX_ARGS)


#  Args for application
NUM_WINDOWS=2046
USE_DYNAMIC=1

#  Sample run usage
#
#       Args:
#       - Number of Windows
#   - Type of windows (1 dynamic, 0 static)
run:
        mpirun -np 2 -hosts localhost ./mpi-win-test $(NUM_WINDOWS) 
$(USE_DYNAMIC)



#------------------------------------------------------------------------------------------#
#-                  Here is my Sample Application -#
#------------------------------------------------------------------------------------------#


#include <mpi.h>

#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main( int argc, char* argv[] )
{
    // Number of MPI Windows
    int num_windows = std::stoi(argv[1]);

    bool use_dynamic = std::stoi(argv[2]);

    // Initialize MPI
    MPI_Init( &argc, &argv );

    // Get the rank and size
    int rank, nprocs;
    MPI_Comm_size( MPI_COMM_WORLD, &nprocs );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );

    // Get the processor name
    char hostname[MPI_MAX_PROCESSOR_NAME];
    int hostname_len;
    MPI_Get_processor_name( hostname, &hostname_len);

    // Print Message
    for( int i=0; i<nprocs; i++ ){
        MPI_Barrier(MPI_COMM_WORLD);
        if( i == rank ){
            std::cout << "Initialized Rank: " << rank << ", Number 
Processors: " << nprocs << ", Hostname: " << hostname << std::endl;
        }
    }
 
 
    // MPI Variables
    vector<MPI_Aint>  sdisp_remotes(num_windows);
    vector<MPI_Aint>  sdisp_locals(num_windows);

    // Create MPI Windows
    vector<MPI_Win> windows(num_windows);
 
    int64_t buffer_size = 1000;
    char*   buffer = new char[buffer_size];

    for( int i=0; i<num_windows; i++ )
    { 
        if( use_dynamic )
        {
            MPI_Win_create_dynamic( MPI_INFO_NULL, MPI_COMM_WORLD, 
&windows[i] );
        }

        else
        {
            MPI_Win_create( &buffer,
                            buffer_size, 
                            1, 
                            MPI_INFO_NULL, 
                            MPI_COMM_WORLD,
                            &windows[i] );
        } 
    }
 
 
    // Exception always occurs prior to reaching this point.
 
 
    // More Code Here that I am removing for brevity

    // Wait at the barrier
    MPI_Barrier( MPI_COMM_WORLD );

    // Remove all windows
    for( int i=0; i<num_windows; i++)
    {
        // Destroy the MPI Window
        MPI_Win_free( &windows[i] );
    }
    windows.clear();

    // Clear buffer 
    delete [] buffer;
    buffer = nullptr;
 
    // Close MPI
    MPI_Finalize();

    return 0;
}

#--------------------------------------------------------------------------#
#-              MPICH Version Output               -#
#--------------------------------------------------------------------------#
MPICH Version:          3.0.4
MPICH Release date:     Wed Apr 24 10:08:10 CDT 2013
MPICH Device:           ch3:nemesis
MPICH configure:        --build=x86_64-redhat-linux-gnu 
--host=x86_64-redhat-linux-gnu --program-prefix= 
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr 
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 
--libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib 
--mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc 
--enable-shared --enable-lib-depend --disable-rpath --enable-fc 
--with-device=ch3:nemesis --with-pm=hydra:gforker 
--sysconfdir=/etc/mpich-x86_64 --includedir=/usr/include/mpich-x86_64 
--bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib 
--datadir=/usr/share/mpich --mandir=/usr/share/man/mpich 
--docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc 
--with-hwloc-prefix=system FC=gfortran F77=gfortran CFLAGS=-m64 -O2 -g 
-pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC 
CXXFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-m64 -mtune=generic -fPIC FCFLAGS=-m64 -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC 
FFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-m64 -mtune=generic -fPIC LDFLAGS=-Wl,-z,noexecstack MPICH2LIB_CFLAGS=-O2 
-g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-m64 -mtune=generic MPICH2LIB_CXXFLAGS=-O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
MPICH2LIB_FCFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-m64 -mtune=generic MPICH2LIB_FFLAGS=-O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
MPICH CC:       cc -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic -fPIC   -O2
MPICH CXX:      c++ -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
MPICH F77:      gfortran -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
MPICH FC:       gfortran -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2

CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are confidential, may contain proprietary, protected, or export controlled information, and are intended for the use of the intended recipients only. Any review, reliance, distribution, disclosure, or forwarding of this email and/or attachments outside of Sierra Nevada Corporation (SNC) without express written approval of the sender, except to the extent required to further properly approved SNC business purposes, is strictly prohibited. If you are not the intended recipient of this email, please notify the sender immediately, and delete all copies without reading, printing, or saving in any manner. --- Thank You.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160919/a3863ea4/attachment.html>


More information about the discuss mailing list