[mpich-discuss] MPI Window limit on mpich

Jeff Hammond jeff.science at gmail.com
Mon Sep 19 13:11:39 CDT 2016


I believe that creating 2046 communicators will also fail, so this doesn't
have anything to do with windows themselves, just their apparent need to
create an associated window.

The limitation in MPICH to create more than ~2000 communicators is known.

Why do you need to create ~2000 dynamic windows?  I can't think of a
reasonable application that would do this.

Jeff

On Mon, Sep 19, 2016 at 10:49 AM, Marvin Smith <Marvin.Smith at sncorp.com>
wrote:

> Good morning,
>
>     I wanted to present an issue I am having with MPICH and validate
> whether this is a configuration problem, a limitation with MPICH, or a bug.
>
> I am writing an application which uses a large number of MPI windows, each
> window is given a relatively large amount of memory.   This has never been
> a problem before, however we discovered if you allocate more than 2045
> windows, you get an exception thrown.
>
> Notes:
>
>    - I am compiling using g++, version 4.8.5  on Red Hat Enterprise Linux
>    version 7.2.
>    - My MPICH version is listed at the bottom of this email.  It was
>    installed via yum and is the RHEL default.
>    - I have attached sample output, to include the stdout/stderr.  Also
>    included is a Makefile and a simple example.
>    - The boundary of failure is between 2045 and 2046 windows.
>    - I have verified on my system this problem repeats even if I
>    distribute windows between multiple communicators.
>    - I have not tested yet against ompi or mvapich.
>
>
>
> #-----------------------------------------------------------
> ------------------------------------#
> #-                                      Here is my output
>                -#
> #-----------------------------------------------------------
> ------------------------------------#
>
> mpirun -np 2 -hosts localhost ./mpi-win-test 2046 1
> Initialized Rank: 0, Number Processors: 2, Hostname: test-machine
> Initialized Rank: 1, Number Processors: 2, Hostname: test-machine
> Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
> MPI_Win_create_dynamic(154)..........: MPI_Win_create_dynamic(MPI_INFO_NULL,
> MPI_COMM_WORLD, win=0x10c1464) failed
> MPID_Win_create_dynamic(139).........:
> win_init(254)........................:
> MPIR_Comm_dup_impl(55)...............:
> MPIR_Comm_copy(1552).................:
> MPIR_Get_contextid(799)..............:
> MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID
> because of fragmentation (0/2048 free on this process; ignore_id=0)
> Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
> MPI_Win_create_dynamic(154)..........: MPI_Win_create_dynamic(MPI_INFO_NULL,
> MPI_COMM_WORLD, win=0x19ef444) failed
> MPID_Win_create_dynamic(139).........:
> win_init(254)........................:
> MPIR_Comm_dup_impl(55)...............:
> MPIR_Comm_copy(1552).................:
> MPIR_Get_contextid(799)..............:
> MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID
> because of fragmentation (0/2048 free on this process; ignore_id=0)
>
> ============================================================
> =======================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ============================================================
> =======================
> make: *** [run] Error 1
>
>
>
>
> *#------------------------------------------------------------------------------------------#*
> *#-                    Here is my Sample Makefile
>          -#*
>
> *#------------------------------------------------------------------------------------------#*
>
> #  Path to mpich on RHEL7
> MPI_INCL=-I/usr/include/mpich-x86_64
> MPI_LIBS=-L/usr/lib64/mpich/lib -lmpich
>
> #  C++11 Bindings (Being lazy with string)
> CXX_ARGS=-std=c++11
>
> #  Make th test
> all: mpi-win-test
>
> mpi-win-test: mpi-win-test.cpp
>         g++ $< -o $@ $(MPI_INCL) $(MPI_LIBS) $(CXX_ARGS)
>
>
> #  Args for application
> NUM_WINDOWS=2046
> USE_DYNAMIC=1
>
> #  Sample run usage
> #
> #        Args:
> #          - Number of Windows
> #   - Type of windows (1 dynamic, 0 static)
> run:
>         mpirun -np 2 -hosts localhost ./mpi-win-test $(NUM_WINDOWS)
> $(USE_DYNAMIC)
>
>
>
>
> *#------------------------------------------------------------------------------------------#*
> *#-                    Here is my Sample Application
>  -#*
>
> *#------------------------------------------------------------------------------------------#*
>
>
> #include <mpi.h>
>
> #include <iostream>
> #include <string>
> #include <vector>
>
> using namespace std;
>
> int main( int argc, char* argv[] )
> {
>     // Number of MPI Windows
>     int num_windows = std::stoi(argv[1]);
>
>     bool use_dynamic = std::stoi(argv[2]);
>
>     // Initialize MPI
>     MPI_Init( &argc, &argv );
>
>     // Get the rank and size
>     int rank, nprocs;
>     MPI_Comm_size( MPI_COMM_WORLD, &nprocs );
>     MPI_Comm_rank( MPI_COMM_WORLD, &rank );
>
>     // Get the processor name
>     char hostname[MPI_MAX_PROCESSOR_NAME];
>     int hostname_len;
>     MPI_Get_processor_name( hostname, &hostname_len);
>
>     // Print Message
>     for( int i=0; i<nprocs; i++ ){
>         MPI_Barrier(MPI_COMM_WORLD);
>         if( i == rank ){
>             std::cout << "Initialized Rank: " << rank << ", Number
> Processors: " << nprocs << ", Hostname: " << hostname << std::endl;
>         }
>     }
>
>
>     // MPI Variables
>     vector<MPI_Aint>  sdisp_remotes(num_windows);
>     vector<MPI_Aint>  sdisp_locals(num_windows);
>
>     // Create MPI Windows
>     vector<MPI_Win> windows(num_windows);
>
>     int64_t buffer_size = 1000;
>     char*   buffer = new char[buffer_size];
>
>     for( int i=0; i<num_windows; i++ )
>     {
>         if( use_dynamic )
>         {
>             MPI_Win_create_dynamic( MPI_INFO_NULL, MPI_COMM_WORLD,
> &windows[i] );
>         }
>
>         else
>         {
>             MPI_Win_create( &buffer,
>                             buffer_size,
>                             1,
>                             MPI_INFO_NULL,
>                             MPI_COMM_WORLD,
>                             &windows[i] );
>         }
>     }
>
>
>     // Exception always occurs prior to reaching this point.
>
>
>     // More Code Here that I am removing for brevity
>
>     // Wait at the barrier
>     MPI_Barrier( MPI_COMM_WORLD );
>
>     // Remove all windows
>     for( int i=0; i<num_windows; i++)
>     {
>         // Destroy the MPI Window
>         MPI_Win_free( &windows[i] );
>     }
>     windows.clear();
>
>     // Clear buffer
>     delete [] buffer;
>     buffer = nullptr;
>
>     // Close MPI
>     MPI_Finalize();
>
>     return 0;
> }
>
>
> *#--------------------------------------------------------------------------#*
> *#-                MPICH Version Output                   -#*
>
> *#--------------------------------------------------------------------------#*
> MPICH Version:            3.0.4
> MPICH Release date:        Wed Apr 24 10:08:10 CDT 2013
> MPICH Device:            ch3:nemesis
> MPICH configure:         --build=x86_64-redhat-linux-gnu
> --host=x86_64-redhat-linux-gnu --program-prefix=
> --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
> --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
> --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
> --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib
> --mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc
> --enable-shared --enable-lib-depend --disable-rpath --enable-fc
> --with-device=ch3:nemesis --with-pm=hydra:gforker
> --sysconfdir=/etc/mpich-x86_64 --includedir=/usr/include/mpich-x86_64
> --bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib
> --datadir=/usr/share/mpich --mandir=/usr/share/man/mpich
> --docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc
> --with-hwloc-prefix=system FC=gfortran F77=gfortran CFLAGS=-m64 -O2 -g
> -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
> CXXFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -fPIC FCFLAGS=-m64 -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
> FFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -fPIC LDFLAGS=-Wl,-z,noexecstack MPICH2LIB_CFLAGS=-O2
> -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic MPICH2LIB_CXXFLAGS=-O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> MPICH2LIB_FCFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic MPICH2LIB_FFLAGS=-O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> MPICH CC:         cc -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC   -O2
> MPICH CXX:         c++ -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
> MPICH F77:         gfortran -m64 -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -fPIC
>  -O2
> MPICH FC:         gfortran -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
> CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are
> confidential, may contain proprietary, protected, or export controlled
> information, and are intended for the use of the intended recipients only.
> Any review, reliance, distribution, disclosure, or forwarding of this email
> and/or attachments outside of Sierra Nevada Corporation (SNC) without
> express written approval of the sender, except to the extent required to
> further properly approved SNC business purposes, is strictly prohibited. If
> you are not the intended recipient of this email, please notify the sender
> immediately, and delete all copies without reading, printing, or saving in
> any manner. --- Thank You.
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160919/274e4f39/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list