[mpich-discuss] running out of communicators

Ryan Crocker rcrocker at uvm.edu
Tue Mar 19 23:50:22 CDT 2013


I just checked, i get this error in serial too.  Also, it is after a long run, 10-20 thousand iterations.  I ran with Valgrind and i didn't see a problem there, but i'm re-running it with gdb to see what shows up when it actually crashes.

-Ryan

On Mar 19, 2013, at 8:18 PM, Jim Dinan wrote:

> I don't recall the exact limit on the number of communicators, but is certainly much more than 21.  Any chance that you're forgetting to close one of those files somewhere?  Can you attach a debugger to confirm that the error is happening in MPI_File_open?
> 
> ~Jim.
> 
> On 3/19/13 8:50 PM, Ryan Crocker wrote:
>> I'm opening 21 files and i do have the close calls after they are finished being written.
>> 
>> On Mar 19, 2013, at 6:29 PM, Jim Dinan wrote:
>> 
>>> Hi Ryan,
>>> 
>>> MPI_File_open does call MPI_Comm_dup internally.  How many files are you opening?  And are you closing them when you're finished with them?
>>> 
>>> ~Jim.
>>> 
>>> On 3/19/13 8:20 PM, Ryan Crocker wrote:
>>>> Hi Jim,
>>>> 
>>>> That's the thing, i'm not sure where they are being created.  I initialize my MPI environment, then that program never gets called again.  The only think i can think of is that it's something to do with my MPI IO, but after every open file, i just double checked, there is a corresponding close file call. I also do not call MPI_Comm_dup, unless that is called by another mpi call that i am not aware of.
>>>> 
>>>> -Ryan
>>>> 
>>>> On Mar 19, 2013, at 6:13 PM, Jim Dinan wrote:
>>>> 
>>>>> Hi Ryan,
>>>>> 
>>>>> Every time you call MPI_Comm_dup a new communicator is created.  Are you ever freeing these, using MPI_Comm_free?
>>>>> 
>>>>> Also, what are you trying to achieve by using multiple communicators, and why does it require so many?
>>>>> 
>>>>> Best,
>>>>> ~Jim.
>>>>> 
>>>>> On 3/19/13 7:57 PM, Ryan Crocker wrote:
>>>>>> I realized i forgot to attach the error:
>>>>>> 
>>>>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack:
>>>>>> PMPI_Comm_dup(176)............: MPI_Comm_dup(comm=0x84000000, new_comm=0x7fff5fbfe9a4) failed
>>>>>> PMPI_Comm_dup(161)............:
>>>>>> MPIR_Comm_dup_impl(55)........:
>>>>>> MPIR_Comm_copy(967)...........:
>>>>>> MPIR_Get_contextid(521).......:
>>>>>> MPIR_Get_contextid_sparse(752): Too many communicators
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> So i can't seem to find the answer to this question.  I keep getting the failure "Too many communicators".  Could someone explain what calls "use up" communicators, and could i be calling/using more than the default total?.  I'm trying to debug my code so i can free up the communicators and not just repeatedly create them.  I'm a bit perplexed with this, probably a lack of  in depth knowledge, but i only run any of my MPI initializations one time then all my calls with mpi are either sums, max, min, allreduce, alltoall.
>>>>>> 
>>>>>> thanks
>>>>>> 
>>>>>> Ryan Crocker
>>>>>> University of Vermont, School of Engineering
>>>>>> Mechanical Engineering Department
>>>>>> rcrocker at uvm.edu
>>>>>> 315-212-7331
>>>>>> 
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>> 
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>> 
>>>> Ryan Crocker
>>>> University of Vermont, School of Engineering
>>>> Mechanical Engineering Department
>>>> rcrocker at uvm.edu
>>>> 315-212-7331
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>> 
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> Ryan Crocker
>> University of Vermont, School of Engineering
>> Mechanical Engineering Department
>> rcrocker at uvm.edu
>> 315-212-7331
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

Ryan Crocker
University of Vermont, School of Engineering
Mechanical Engineering Department
rcrocker at uvm.edu
315-212-7331




More information about the discuss mailing list