[mpich-discuss] MPI Window limit on mpich
Marvin Smith
Marvin.Smith at sncorp.com
Mon Sep 19 13:15:25 CDT 2016
Thanks for the quick reply. Do you see this value increasing in the
future? In the meantime, I have a solution with less windows.
Thanks,
Marvin
From: "Oden, Lena" <loden at anl.gov>
To: "discuss at mpich.org" <discuss at mpich.org>
Date: 09/19/2016 11:11 AM
Subject: Re: [mpich-discuss] MPI Window limit on mpich
Hi Marvin,
currently, this is a limitation inside MPICH.
For every new window, MPICH internally creates a new communicator - for
every communicator a new (unique) context ID is required -
and the number of different context IDs is limited to 2048.
This context-id/ communicator is required for internal synchronization
(e.g. barriers) . We have to ensure, that his communication is
not interfering with other communication on other windows or
communicators.
If you use more communicators, you should run into this problem earlier,
because you already create other communicators
(the limit is per process)
Lena
On Sep 19, 2016, at 12:49 PM, Marvin Smith <Marvin.Smith at sncorp.com>
wrote:
Good morning,
I wanted to present an issue I am having with MPICH and validate
whether this is a configuration problem, a limitation with MPICH, or a
bug.
I am writing an application which uses a large number of MPI windows, each
window is given a relatively large amount of memory. This has never been
a problem before, however we discovered if you allocate more than 2045
windows, you get an exception thrown.
Notes:
I am compiling using g++, version 4.8.5 on Red Hat Enterprise Linux
version 7.2.
My MPICH version is listed at the bottom of this email. It was installed
via yum and is the RHEL default.
I have attached sample output, to include the stdout/stderr. Also
included is a Makefile and a simple example.
The boundary of failure is between 2045 and 2046 windows.
I have verified on my system this problem repeats even if I distribute
windows between multiple communicators.
I have not tested yet against ompi or mvapich.
#-----------------------------------------------------------------------------------------------#
#- Here is my output -#
#-----------------------------------------------------------------------------------------------#
mpirun -np 2 -hosts localhost ./mpi-win-test 2046 1
Initialized Rank: 0, Number Processors: 2, Hostname: test-machine
Initialized Rank: 1, Number Processors: 2, Hostname: test-machine
Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
MPI_Win_create_dynamic(154)..........:
MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, win=0x10c1464)
failed
MPID_Win_create_dynamic(139).........:
win_init(254)........................:
MPIR_Comm_dup_impl(55)...............:
MPIR_Comm_copy(1552).................:
MPIR_Get_contextid(799)..............:
MPIR_Get_contextid_sparse_group(1146): Cannot allocate context ID because
of fragmentation (0/2048 free on this process; ignore_id=0)
Fatal error in MPI_Win_create_dynamic: Other MPI error, error stack:
MPI_Win_create_dynamic(154)..........:
MPI_Win_create_dynamic(MPI_INFO_NULL, MPI_COMM_WORLD, win=0x19ef444)
failed
MPID_Win_create_dynamic(139).........:
win_init(254)........................:
MPIR_Comm_dup_impl(55)...............:
MPIR_Comm_copy(1552).................:
MPIR_Get_contextid(799)..............:
MPIR_Get_contextid_sparse_group(1146): Cannot allocate context ID because
of fragmentation (0/2048 free on this process; ignore_id=0)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
make: *** [run] Error 1
#------------------------------------------------------------------------------------------#
#- Here is my Sample Makefile -#
#------------------------------------------------------------------------------------------#
# Path to mpich on RHEL7
MPI_INCL=-I/usr/include/mpich-x86_64
MPI_LIBS=-L/usr/lib64/mpich/lib -lmpich
# C++11 Bindings (Being lazy with string)
CXX_ARGS=-std=c++11
# Make th test
all: mpi-win-test
mpi-win-test: mpi-win-test.cpp
g++ $< -o $@ $(MPI_INCL) $(MPI_LIBS) $(CXX_ARGS)
# Args for application
NUM_WINDOWS=2046
USE_DYNAMIC=1
# Sample run usage
#
# Args:
# - Number of Windows
# - Type of windows (1 dynamic, 0 static)
run:
mpirun -np 2 -hosts localhost ./mpi-win-test $(NUM_WINDOWS)
$(USE_DYNAMIC)
#------------------------------------------------------------------------------------------#
#- Here is my Sample Application -#
#------------------------------------------------------------------------------------------#
#include <mpi.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main( int argc, char* argv[] )
{
// Number of MPI Windows
int num_windows = std::stoi(argv[1]);
bool use_dynamic = std::stoi(argv[2]);
// Initialize MPI
MPI_Init( &argc, &argv );
// Get the rank and size
int rank, nprocs;
MPI_Comm_size( MPI_COMM_WORLD, &nprocs );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
// Get the processor name
char hostname[MPI_MAX_PROCESSOR_NAME];
int hostname_len;
MPI_Get_processor_name( hostname, &hostname_len);
// Print Message
for( int i=0; i<nprocs; i++ ){
MPI_Barrier(MPI_COMM_WORLD);
if( i == rank ){
std::cout << "Initialized Rank: " << rank << ", Number
Processors: " << nprocs << ", Hostname: " << hostname << std::endl;
}
}
// MPI Variables
vector<MPI_Aint> sdisp_remotes(num_windows);
vector<MPI_Aint> sdisp_locals(num_windows);
// Create MPI Windows
vector<MPI_Win> windows(num_windows);
int64_t buffer_size = 1000;
char* buffer = new char[buffer_size];
for( int i=0; i<num_windows; i++ )
{
if( use_dynamic )
{
MPI_Win_create_dynamic( MPI_INFO_NULL, MPI_COMM_WORLD,
&windows[i] );
}
else
{
MPI_Win_create( &buffer,
buffer_size,
1,
MPI_INFO_NULL,
MPI_COMM_WORLD,
&windows[i] );
}
}
// Exception always occurs prior to reaching this point.
// More Code Here that I am removing for brevity
// Wait at the barrier
MPI_Barrier( MPI_COMM_WORLD );
// Remove all windows
for( int i=0; i<num_windows; i++)
{
// Destroy the MPI Window
MPI_Win_free( &windows[i] );
}
windows.clear();
// Clear buffer
delete [] buffer;
buffer = nullptr;
// Close MPI
MPI_Finalize();
return 0;
}
#--------------------------------------------------------------------------#
#- MPICH Version Output -#
#--------------------------------------------------------------------------#
MPICH Version: 3.0.4
MPICH Release date: Wed Apr 24 10:08:10 CDT 2013
MPICH Device: ch3:nemesis
MPICH configure: --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib
--mandir=/usr/share/man --infodir=/usr/share/info --enable-sharedlibs=gcc
--enable-shared --enable-lib-depend --disable-rpath --enable-fc
--with-device=ch3:nemesis --with-pm=hydra:gforker
--sysconfdir=/etc/mpich-x86_64 --includedir=/usr/include/mpich-x86_64
--bindir=/usr/lib64/mpich/bin --libdir=/usr/lib64/mpich/lib
--datadir=/usr/share/mpich --mandir=/usr/share/man/mpich
--docdir=/usr/share/mpich/doc --htmldir=/usr/share/mpich/doc
--with-hwloc-prefix=system FC=gfortran F77=gfortran CFLAGS=-m64 -O2 -g
-pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
CXXFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic -fPIC FCFLAGS=-m64 -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC
FFLAGS=-m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic -fPIC LDFLAGS=-Wl,-z,noexecstack MPICH2LIB_CFLAGS=-O2
-g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic MPICH2LIB_CXXFLAGS=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
MPICH2LIB_FCFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic MPICH2LIB_FFLAGS=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
MPICH CC: cc -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -fPIC -O2
MPICH CXX: c++ -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -fPIC -O2
MPICH F77: gfortran -m64 -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-fPIC -O2
MPICH FC: gfortran -m64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -fPIC -O2
CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are
confidential, may contain proprietary, protected, or export controlled
information, and are intended for the use of the intended recipients only.
Any review, reliance, distribution, disclosure, or forwarding of this
email and/or attachments outside of Sierra Nevada Corporation (SNC)
without express written approval of the sender, except to the extent
required to further properly approved SNC business purposes, is strictly
prohibited. If you are not the intended recipient of this email, please
notify the sender immediately, and delete all copies without reading,
printing, or saving in any manner. --- Thank You.
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
CONFIDENTIALITY NOTICE - SNC EMAIL: This email and any attachments are confidential, may contain proprietary, protected, or export controlled information, and are intended for the use of the intended recipients only. Any review, reliance, distribution, disclosure, or forwarding of this email and/or attachments outside of Sierra Nevada Corporation (SNC) without express written approval of the sender, except to the extent required to further properly approved SNC business purposes, is strictly prohibited. If you are not the intended recipient of this email, please notify the sender immediately, and delete all copies without reading, printing, or saving in any manner. --- Thank You.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160919/53787701/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list