[mpich-discuss] discuss Digest, Vol 9, Issue 13

Wesley Bland wbland at mcs.anl.gov
Mon Jul 8 10:25:38 CDT 2013


That's the correct way to set that environment variable, but I would have to agree with Pavan here. If you're running into the cap on context id's (especially with so few processes) it would seem that you're creating a lot of communicators that you probably don't need. Are you sure that you're freeing them correctly after use?

Wesley

On Jul 8, 2013, at 10:05 AM, Reem Alraddadi <raba500 at york.ac.uk> wrote:

> Hi Wesley.
> I wrote the following:
> mpirun --np 4 --env MPIR_PARAM_CTXID_EAGER_SIZE 1 ./flash4
> but the error still the same. Did I do it in the wrong way??
> 
> Thanks,
> Reem
> 
> Message: 5
> Date: Mon, 8 Jul 2013 08:14:48 -0500
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] mpich on Mac os x
> Message-ID: <8DC984B2-4E4B-4BFE-806E-203463A7A4E4 at mcs.anl.gov>
> Content-Type: text/plain; charset=iso-8859-1
> 
> It seems that you're creating more communicators than MPICH can handle. It's possible that you might be able to get around this by setting the environment variable MPIR_PARAM_CTXID_EAGER_SIZE to something smaller than its default (which is 2). That frees up a few more communicators, but there is a pathological case where even with fewer communicators than the max, MPICH won't be able to agree on a new communicator id when needed. Try changing that environment variable and see if that fixes things.
> 
> Wesley
> 
> On Jul 8, 2013, at 5:33 AM, Reem Alraddadi <raba500 at york.ac.uk> wrote:
> 
> > Hi all,
> > I am using mpich-3.0.4 on Mac os x version 10.7.5 to run FLASH code. It works fine in the beginning of the run and then I got the following error:
> >
> > Fatal error in MPI_Comm_create: Other MPI error, error stack:
> > MPI_Comm_create(600).................: MPI_Comm_create(comm=0x84000002, group=0xc8001349, new_comm=0x7fff606a8614) failed
> > MPI_Comm_create(577).................:
> > MPIR_Comm_create_intra(241)..........:
> > MPIR_Get_contextid(799)..............:
> > MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because of fragmentation (169/2048 free on this process; ignore_id=0)
> > Fatal error in MPI_Comm_create: Other MPI error, error stack:
> > MPI_Comm_create(600).................: MPI_Comm_create(comm=0x84000002, group=0xc80012b6, new_comm=0x7fff670cc614) failed
> > MPI_Comm_create(577).................:
> > MPIR_Comm_create_intra(241)..........:
> > MPIR_Get_contextid(799)..............:
> > MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because of fragmentation (316/2048 free on this process; ignore_id=0)
> > Fatal error in MPI_Comm_create: Other MPI error, error stack:
> > MPI_Comm_create(600).................: MPI_Comm_create(comm=0x84000004, group=0xc800000e, new_comm=0x7fff629d5614) failed
> > MPI_Comm_create(577).................:
> > MPIR_Comm_create_intra(241)..........:
> > MPIR_Get_contextid(799)..............:
> > MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because of fragmentation (2020/2048 free on this process; ignore_id=0)
> > Fatal error in MPI_Comm_create: Other MPI error, error stack:
> > MPI_Comm_create(600).................: MPI_Comm_create(comm=0x84000002, group=0xc8000020, new_comm=0x7fff639ae614) failed
> > MPI_Comm_create(577).................:
> > MPIR_Comm_create_intra(241)..........:
> > MPIR_Get_contextid(799)..............:
> > MPIR_Get_contextid_sparse_group(1146):  Cannot allocate context ID because of fragmentation (2002/2048 free on this process; ignore_id=0
> >
> > Is there a way to fix that ?
> >
> > Thanks,
> > Reem
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130708/a785b9b8/attachment.html>


More information about the discuss mailing list