[mpich-discuss] running out of communicators

Jim Dinan dinan at mcs.anl.gov
Wed Mar 20 15:51:19 CDT 2013


Hi Ryan,

If you set a breakpoint at MPIR_Err_return_comm, you should be able to 
see the rest of the call stack where the error occurred.

  ~Jim.

On 3/20/13 3:08 PM, Ryan Crocker wrote:
> So i ran my code through totalview and it didn't pick up on anything, but i still crash with the same error.  I'm starting to think it's one of my third party programs.  I'm using Hypre as my Poission solver.  Does anybody know if that could be causing the error?
>
> -Ryan
> On Mar 19, 2013, at 8:18 PM, Jim Dinan wrote:
>
>> I don't recall the exact limit on the number of communicators, but is certainly much more than 21.  Any chance that you're forgetting to close one of those files somewhere?  Can you attach a debugger to confirm that the error is happening in MPI_File_open?
>>
>> ~Jim.
>>
>> On 3/19/13 8:50 PM, Ryan Crocker wrote:
>>> I'm opening 21 files and i do have the close calls after they are finished being written.
>>>
>>> On Mar 19, 2013, at 6:29 PM, Jim Dinan wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> MPI_File_open does call MPI_Comm_dup internally.  How many files are you opening?  And are you closing them when you're finished with them?
>>>>
>>>> ~Jim.
>>>>
>>>> On 3/19/13 8:20 PM, Ryan Crocker wrote:
>>>>> Hi Jim,
>>>>>
>>>>> That's the thing, i'm not sure where they are being created.  I initialize my MPI environment, then that program never gets called again.  The only think i can think of is that it's something to do with my MPI IO, but after every open file, i just double checked, there is a corresponding close file call. I also do not call MPI_Comm_dup, unless that is called by another mpi call that i am not aware of.
>>>>>
>>>>> -Ryan
>>>>>
>>>>> On Mar 19, 2013, at 6:13 PM, Jim Dinan wrote:
>>>>>
>>>>>> Hi Ryan,
>>>>>>
>>>>>> Every time you call MPI_Comm_dup a new communicator is created.  Are you ever freeing these, using MPI_Comm_free?
>>>>>>
>>>>>> Also, what are you trying to achieve by using multiple communicators, and why does it require so many?
>>>>>>
>>>>>> Best,
>>>>>> ~Jim.
>>>>>>
>>>>>> On 3/19/13 7:57 PM, Ryan Crocker wrote:
>>>>>>> I realized i forgot to attach the error:
>>>>>>>
>>>>>>> Fatal error in PMPI_Comm_dup: Other MPI error, error stack:
>>>>>>> PMPI_Comm_dup(176)............: MPI_Comm_dup(comm=0x84000000, new_comm=0x7fff5fbfe9a4) failed
>>>>>>> PMPI_Comm_dup(161)............:
>>>>>>> MPIR_Comm_dup_impl(55)........:
>>>>>>> MPIR_Comm_copy(967)...........:
>>>>>>> MPIR_Get_contextid(521).......:
>>>>>>> MPIR_Get_contextid_sparse(752): Too many communicators
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> So i can't seem to find the answer to this question.  I keep getting the failure "Too many communicators".  Could someone explain what calls "use up" communicators, and could i be calling/using more than the default total?.  I'm trying to debug my code so i can free up the communicators and not just repeatedly create them.  I'm a bit perplexed with this, probably a lack of  in depth knowledge, but i only run any of my MPI initializations one time then all my calls with mpi are either sums, max, min, allreduce, alltoall.
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> Ryan Crocker
>>>>>>> University of Vermont, School of Engineering
>>>>>>> Mechanical Engineering Department
>>>>>>> rcrocker at uvm.edu
>>>>>>> 315-212-7331
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>> Ryan Crocker
>>>>> University of Vermont, School of Engineering
>>>>> Mechanical Engineering Department
>>>>> rcrocker at uvm.edu
>>>>> 315-212-7331
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> Ryan Crocker
>>> University of Vermont, School of Engineering
>>> Mechanical Engineering Department
>>> rcrocker at uvm.edu
>>> 315-212-7331
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> Ryan Crocker
> University of Vermont, School of Engineering
> Mechanical Engineering Department
> rcrocker at uvm.edu
> 315-212-7331
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



More information about the discuss mailing list