[mpich-discuss] Maximum number of communicators

Nichols A. Romero naromero at alcf.anl.gov
Mon Mar 4 14:03:14 CST 2013


Dave et al.,

I am still conversing with the application developer about this issue. It looks like I jumped to the wrong conclusion based on the mpiP data that I collected.

I won't ask you to read this whole, post. But if you can read the last post by fgygi here:
http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?f=3&t=219&start=10

I have asked him to try to create a reduced test case based on MPI only. The code is written on top of ScaLAPACK, there is a for loop that creates a communicator, then destroy it based on its rank on the 2D processor grid. Ultimately, when the loop is over, MPI_Comm_create/Comm_free can be called about 1000 times, but only one of those communicators will continue existing. 


----- Original Message -----
> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
> To: discuss at mpich.org
> Sent: Monday, February 25, 2013 2:17:04 PM
> Subject: Re: [mpich-discuss] Maximum number of communicators
> 
> I cannot reproduce this in a simple test code. I am convinced that
> this is an application error. I linked the application against mpiP
> and I see that the calls to Comm_create >> Comm_free by two to three
> orders of magnitude.
> 
> The application is template C++ and I suspect that the C++ class
> destructor is not properly calling Comm_free.
> 
> ----- Original Message -----
> > From: "Dave Goodell" <goodell at mcs.anl.gov>
> > To: discuss at mpich.org
> > Sent: Sunday, February 24, 2013 3:36:54 PM
> > Subject: Re: [mpich-discuss] Maximum number of communicators
> > 
> > Test code indeed would be helpful.
> > 
> > You may get better warnings about context ID and/or communicator
> > leaks if you configure a stock MPICH with "--enable-g=all".
> > 
> > -Dave
> > 
> > On Feb 23, 2013, at 7:05 PM CST, Jeff Hammond wrote:
> > 
> > > It's 2048 in all MPICH2-derived implementations that I know
> > > about,
> > > including the ones used on BGP and BGQ.
> > > 
> > > Since I know what your issue is already, I think the problem has
> > > something to do with context id leaking.  You should post your
> > > test
> > > code.
> > > 
> > > Jeff
> > > 
> > > On Sat, Feb 23, 2013 at 6:24 PM, Nichols A. Romero
> > > <naromero at alcf.anl.gov> wrote:
> > >> Hi,
> > >> 
> > >> What is the maximum number of communicators in the MPICH that is
> > >> used on BG/P vs. that used on BG/Q?
> > >> 
> > >> Sent from my iPhone
> > >> _______________________________________________
> > >> discuss mailing list     discuss at mpich.org
> > >> To manage subscription options or unsubscribe:
> > >> https://lists.mpich.org/mailman/listinfo/discuss
> > > 
> > > 
> > > 
> > > --
> > > Jeff Hammond
> > > Argonne Leadership Computing Facility
> > > University of Chicago Computation Institute
> > > jhammond at alcf.anl.gov / (630) 252-5381
> > > http://www.linkedin.com/in/jeffhammond
> > > https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> > > _______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> > 
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> > 
> 
> --
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne National Laboratory
> Building 240 Room 2-127
> 9700 South Cass Avenue
> Argonne, IL 60490
> (630) 252-3441
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-- 
Nichols A. Romero, Ph.D.
Argonne Leadership Computing Facility
Argonne National Laboratory
Building 240 Room 2-127
9700 South Cass Avenue
Argonne, IL 60490
(630) 252-3441




More information about the discuss mailing list