[mpich-discuss] Maximum number of communicators

Dave Goodell goodell at mcs.anl.gov
Mon Mar 4 14:39:11 CST 2013


Nick,

Re: the linked thread:

Nobody should be using MPICH 1.2.7p1 (old MPICH1, not new MPICH >=v3.0).  Drawing conclusions about MPI-on-BG/Q behavior from that ancient software is a very bad idea.


I would be happy to discuss communicators with you in person, if that would help.  The way these limits get hit can be complicated sometimes.  Basically, a given process can only be a member of ~2048 communicators simultaneously.  Furthermore, this number can go way down depending on which context IDs are free on other processes.

With that said, however, the most common case where applications run out of context IDs is still usually a communicator leak of some sort.

-Dave

On Mar 4, 2013, at 2:03 PM CST, Nichols A. Romero wrote:

> Dave et al.,
> 
> I am still conversing with the application developer about this issue. It looks like I jumped to the wrong conclusion based on the mpiP data that I collected.
> 
> I won't ask you to read this whole, post. But if you can read the last post by fgygi here:
> http://fpmd.ucdavis.edu/qbox-list/viewtopic.php?f=3&t=219&start=10
> 
> I have asked him to try to create a reduced test case based on MPI only. The code is written on top of ScaLAPACK, there is a for loop that creates a communicator, then destroy it based on its rank on the 2D processor grid. Ultimately, when the loop is over, MPI_Comm_create/Comm_free can be called about 1000 times, but only one of those communicators will continue existing. 
> 
> 
> ----- Original Message -----
>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>> To: discuss at mpich.org
>> Sent: Monday, February 25, 2013 2:17:04 PM
>> Subject: Re: [mpich-discuss] Maximum number of communicators
>> 
>> I cannot reproduce this in a simple test code. I am convinced that
>> this is an application error. I linked the application against mpiP
>> and I see that the calls to Comm_create >> Comm_free by two to three
>> orders of magnitude.
>> 
>> The application is template C++ and I suspect that the C++ class
>> destructor is not properly calling Comm_free.
>> 
>> ----- Original Message -----
>>> From: "Dave Goodell" <goodell at mcs.anl.gov>
>>> To: discuss at mpich.org
>>> Sent: Sunday, February 24, 2013 3:36:54 PM
>>> Subject: Re: [mpich-discuss] Maximum number of communicators
>>> 
>>> Test code indeed would be helpful.
>>> 
>>> You may get better warnings about context ID and/or communicator
>>> leaks if you configure a stock MPICH with "--enable-g=all".
>>> 
>>> -Dave
>>> 
>>> On Feb 23, 2013, at 7:05 PM CST, Jeff Hammond wrote:
>>> 
>>>> It's 2048 in all MPICH2-derived implementations that I know
>>>> about,
>>>> including the ones used on BGP and BGQ.
>>>> 
>>>> Since I know what your issue is already, I think the problem has
>>>> something to do with context id leaking.  You should post your
>>>> test
>>>> code.
>>>> 
>>>> Jeff
>>>> 
>>>> On Sat, Feb 23, 2013 at 6:24 PM, Nichols A. Romero
>>>> <naromero at alcf.anl.gov> wrote:
>>>>> Hi,
>>>>> 
>>>>> What is the maximum number of communicators in the MPICH that is
>>>>> used on BG/P vs. that used on BG/Q?
>>>>> 
>>>>> Sent from my iPhone
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jeff Hammond
>>>> Argonne Leadership Computing Facility
>>>> University of Chicago Computation Institute
>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>> http://www.linkedin.com/in/jeffhammond
>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> 
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> 
>> 
>> --
>> Nichols A. Romero, Ph.D.
>> Argonne Leadership Computing Facility
>> Argonne National Laboratory
>> Building 240 Room 2-127
>> 9700 South Cass Avenue
>> Argonne, IL 60490
>> (630) 252-3441
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
> 
> -- 
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne National Laboratory
> Building 240 Room 2-127
> 9700 South Cass Avenue
> Argonne, IL 60490
> (630) 252-3441
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list