[mpich-discuss] MPI Window limit on mpich

Jeff Hammond jeff.science at gmail.com
Mon Sep 19 17:31:26 CDT 2016


The upper limit has not changed by more than one bit (102X to 204X) in as
long as I have been paying attention.  There is a need to keep the bits in
(comm,tag,rank) small, although I don't know the details in MPICH but the
issue was discussed in
https://cug.org/5-publications/proceedings_attendee_lists/CUG09CD/S09_Proceedings/pages/authors/01-5Monday/3C-Pagel/pagel-paper.pdf
.

A better option for RMA programs is to provide a window info key to disable
the duplicating of communicators for every window.  This would force the
implementation to generate them on-the-fly for window collectives (e.g.
MPI_Win_fence and MPI_Win_free) but there are many RMA programs that do not
use MPI_Win_fence (the most reasonable use of RMA is passive target with
flush synchronization, which is non-collective).  If MPI_Win_free is on the
critical path, the application is poorly written.

Jeff

On Mon, Sep 19, 2016 at 11:15 AM, Marvin Smith <Marvin.Smith at sncorp.com>
wrote:

> Thanks for the quick reply.   Do you see this value increasing in the
> future?  In the meantime, I have a solution with less windows.
>
> Thanks,
> Marvin
>
>
>
> From:        "Oden, Lena" <loden at anl.gov>
> To:        "discuss at mpich.org" <discuss at mpich.org>
> Date:        09/19/2016 11:11 AM
> Subject:        Re: [mpich-discuss] MPI Window limit on mpich
> ------------------------------
>
>
>
> Hi Marvin,
>
> currently, this is a limitation inside MPICH.
> For every new window, MPICH internally creates a new communicator - for
> every communicator a new (unique) context ID is required -
> and the number of different context IDs is limited to 2048.
>
> This context-id/ communicator is required for internal synchronization
> (e.g. barriers) . We have to ensure, that his communication is
> not interfering with other communication on other windows or communicators.
>
> If you use more communicators, you should run into this problem earlier,
> because you already create other communicators
> (the limit is per process)
>
> Lena
>
> On Sep 19, 2016, at 12:49 PM, Marvin Smith <*Marvin.Smith at sncorp.com*
> <Marvin.Smith at sncorp.com>> wrote:
>
> Good morning,
>
>    I wanted to present an issue I am having with MPICH and validate
> whether this is a configuration problem, a limitation with MPICH, or a bug.
>
> I am writing an application which uses a large number of MPI windows, each
> window is given a relatively large amount of memory.   This has never been
> a problem before, however we discovered if you allocate more than 2045
> windows, you get an exception thrown.
>
> Notes:
>
>    - I am compiling using g++, version 4.8.5  on Red Hat Enterprise Linux
>    version 7.2.
>    - My MPICH version is listed at the bottom of this email.  It was
>    installed via yum and is the RHEL default.
>    - I have attached sample output, to include the stdout/stderr.  Also
>    included is a Makefile and a simple example.
>    - The boundary of failure is between 2045 and 2046 windows.
>    - I have verified on my system this problem repeats even if I
>    distribute windows between multiple communicators.
>    - I have not tested yet against ompi or mvapich.
>
>
>
> #-----------------------------------------------------------
> ------------------------------------#
> #-                                      Here is my output
>                -#
> #-----------------------------------------------------------
> ------------------------------------#
>
> mpirun -np 2 -hosts localhost ./mpi-win-test 2046 1
> Initialized Rank: 0, Number Processors: 2, Hostname: test-machine
> Initialized Rank: 1, Number Processors: 2, Hostname: test-machine
> Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
> MPI_Win_create_dynamic(154)..........: MPI_Win_create_dynamic(MPI_INFO_NULL,
> MPI_COMM_WORLD, win=0x10c1464) failed
> MPID_Win_create_dynamic(139).........:
> win_init(254)........................:
> MPIR_Comm_dup_impl(55)...............:
> MPIR_Comm_copy(1552).................:
> MPIR_Get_contextid(799)..............:
> MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID
> because of fragmentation (0/2048 free on this process; ignore_id=0)
> Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
> MPI_Win_create_dynamic(154)..........: MPI_Win_create_dynamic(MPI_INFO_NULL,
> MPI_COMM_WORLD, win=0x19ef444) failed
> MPID_Win_create_dynamic(139).........:
> win_init(254)........................:
> MPIR_Comm_dup_impl(55)...............:
> MPIR_Comm_copy(1552).................:
> MPIR_Get_contextid(799)..............:
> MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID
> because of fragmentation (0/2048 free on this process; ignore_id=0)
>
> ============================================================
> =======================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ============================================================
> =======================
> make: *** [run] Error 1
>
>
>
> *
> #------------------------------------------------------------------------------------------#*
> * #-                    Here is my Sample Makefile
>          -#*
> *
> #------------------------------------------------------------------------------------------#*
>
> #  Path to mpich on RHEL7
> MPI_INCL=-I/usr/include/mpich-x86_64
> MPI_LIBS=-L/usr/lib64/mpich/lib -lmpich
>
> #  C++11 Bindings (Being lazy with string)
> CXX_ARGS=-std=c++11
>
> #  Make th test
> all: mpi-win-test
>
> mpi-win-test: mpi-win-test.cpp
>        g++ $< -o $@ $(MPI_INCL) $(MPI_LIBS) $(CXX_ARGS)
>
>
> #  Args for application
> NUM_WINDOWS=2046
> USE_DYNAMIC=1
>
> #  Sample run usage
> #
> #        Args:
> #          - Number of Windows
> #   - Type of windows (1 dynamic, 0 static)
> run:
>        mpirun -np 2 -hosts localhost ./mpi-win-test $(NUM_WINDOWS)
> $(USE_DYNAMIC)
>
>
>
> *
> #------------------------------------------------------------------------------------------#*
> * #-                    Here is my Sample Application
>    -#*
> *
> #------------------------------------------------------------------------------------------#*
>
>
> #include <mpi.h>
>
> #include <iostream>
> #include <string>
> #include <vector>
>
> using namespace std;
>
> int main( int argc, char* argv[] )
> {
>    // Number of MPI Windows
>    int num_windows = std::stoi(argv[1]);
>
>    bool use_dynamic = std::stoi(argv[2]);
>
>    // Initialize MPI
>    MPI_Init( &argc, &argv );
>
>    // Get the rank and size
>    int rank, nprocs;
>    MPI_Comm_size( MPI_COMM_WORLD, &nprocs );
>    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
>
>    // Get the processor name
>    char hostname[MPI_MAX_PROCESSOR_NAME];
>    int hostname_len;
>    MPI_Get_processor_name( hostname, &hostname_len);
>
>    // Print Message
>    for( int i=0; i<nprocs; i++ ){
>        MPI_Barrier(MPI_COMM_WORLD);
>        if( i == rank ){
>            std::cout << "Initialized Rank: " << rank << ", Number
> Processors: " << nprocs << ", Hostname: " << hostname << std::endl;
>        }
>    }
>
>
>    // MPI Variables
>    vector<MPI_Aint>  sdisp_remotes(num_windows);
>    vector<MPI_Aint>  sdisp_locals(num_windows);
>
>    // Create MPI Windows
>    vector<MPI_Win> windows(num_windows);
>
>    int64_t buffer_size = 1000;
>    char*   buffer = new char[buffer_size];
>
>    for( int i=0; i<num_windows; i++ )
>    {
>        if( use_dynamic )
>        {
>            MPI_Win_create_dynamic( MPI_INFO_NULL, MPI_COMM_WORLD,
> &windows[i] );
>        }
>
>        else
>        {
>            MPI_Win_create( &buffer,
>                            buffer_size,
>                            1,
>                            MPI_INFO_NULL,
>                            MPI_COMM_WORLD,
>                            &windows[i] );
>        }
>    }
>
>
>    // Exception always occurs prior to reaching this point.
>
>
>    // More Code Here that I am removing for brevity
>
>    // Wait at the barrier
>    MPI_Barrier( MPI_COMM_WORLD );
>
>    // Remove all windows
>    for( int i=0; i<num_windows; i++)
>    {
>        // Destroy the MPI Window
>        MPI_Win_free( &windows[i] );
>    }
>    windows.clear();
>
>    // Clear buffer
>    delete [] buffer;
>    buffer = nullptr;
>
>    // Close MPI
>    MPI_Finalize();
>
>    return 0;
> }
>
> *
> #--------------------------------------------------------------------------#*
> * #-                MPICH Version Output                   -#*
> *
> #--------------------------------------------------------------------------#*
> MPICH Version:            3.0.4
> MPICH Release date:        Wed Apr 24 10:08:10 CDT 2013
> MPICH Device:            ch3:nemesis
> MPICH configure:         --build=x86_64-redhat-linux-gnu
> --host=x86_64-redhat-linux-gnu --program-prefix=
> --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
> --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
> --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
> --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib
> --mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc
> --enable-shared --enable-lib-depend --disable-rpath --enable-fc
> --with-device=ch3:nemesis --with-pm=hydra:gforker
> --sysconfdir=/etc/mpich-x86_64 --includedir=/usr/include/mpich-x86_64
> --bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib
> --datadir=/usr/share/mpich --mandir=/usr/share/man/mpich
> --docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc
> --with-hwloc-prefix=system FC=gfortran F77=gfortran CFLAGS=-m64 -O2 -g
> -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
> CXXFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -fPIC FCFLAGS=-m64 -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
> FFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -fPIC LDFLAGS=-Wl,-z,noexecstack MPICH2LIB_CFLAGS=-O2
> -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic MPICH2LIB_CXXFLAGS=-O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> MPICH2LIB_FCFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic MPICH2LIB_FFLAGS=-O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> MPICH CC:         cc -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC   -O2
> MPICH CXX:         c++ -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
> MPICH F77:         gfortran -m64 -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -fPIC
>  -O2
> MPICH FC:         gfortran -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -fPIC  -O2
> CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are
> confidential, may contain proprietary, protected, or export controlled
> information, and are intended for the use of the intended recipients only.
> Any review, reliance, distribution, disclosure, or forwarding of this email
> and/or attachments outside of Sierra Nevada Corporation (SNC) without
> express written approval of the sender, except to the extent required to
> further properly approved SNC business purposes, is strictly prohibited. If
> you are not the intended recipient of this email, please notify the sender
> immediately, and delete all copies without reading, printing, or saving in
> any manner. --- Thank You.
> _______________________________________________
> discuss mailing list     *discuss at mpich.org* <discuss at mpich.org>
> To manage subscription options or unsubscribe:
> *https://lists.mpich.org/mailman/listinfo/discuss*
> <https://lists.mpich.org/mailman/listinfo/discuss>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are
> confidential, may contain proprietary, protected, or export controlled
> information, and are intended for the use of the intended recipients only.
> Any review, reliance, distribution, disclosure, or forwarding of this email
> and/or attachments outside of Sierra Nevada Corporation (SNC) without
> express written approval of the sender, except to the extent required to
> further properly approved SNC business purposes, is strictly prohibited. If
> you are not the intended recipient of this email, please notify the sender
> immediately, and delete all copies without reading, printing, or saving in
> any manner. --- Thank You.
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160919/da99b898/attachment-0001.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list