[mpich-discuss] Memory leak comparing communicators

Rob Latham robl at mcs.anl.gov
Thu May 1 10:29:18 CDT 2014



On 05/01/2014 09:29 AM, VandeVondele Joost wrote:
> actually, the issue seems to be with MPI_Comm_group since translate_ranks can be just commented out.

your test case is missing two calls to MPI_Group_free.

Building against an mpich compiled with with --enable-g=all makes this 
instantly clear, so thanks for the test case.

==rob

>
> Joost
>
> ________________________________________
> From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of "Antonio J. Peña" [apenya at mcs.anl.gov]
> Sent: Thursday, May 01, 2014 4:08 PM
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Memory leak comparing communicators
>
> Thank you Joost. We'll check this issue. I'll keep you posted.
>
>
> On 05/01/2014 09:02 AM, VandeVondele Joost wrote:
>> Hi Antonio,
>>
>> I finally reproduced this in a small program. The culprit actually seems to be MPI_Group_translate_ranks, but the leak shows up with a trace to MPI_Comm_compare:
>>
>> Direct leak of 3072 byte(s) in 96 object(s) allocated from:
>>       #0 0x7fdab2e213a8 in __interceptor_malloc ../../../../gcc/libsanitizer/lsan/lsan_interceptors.cc:66
>>       #1 0x7fdab3efaf6b in MPIR_Group_create /data/vjoost/mpich-3.1/src/mpi/group/grouputil.c:77
>>       #2 0x7fdab3fc8314 in MPIR_Comm_group_impl /data/vjoost/mpich-3.1/src/mpi/comm/comm_group.c:44
>>       #3 0x7fdab3fc40d3 in PMPI_Comm_compare /data/vjoost/mpich-3.1/src/mpi/comm/comm_compare.c:122
>>       #4 0x4009de in main /data/vjoost/mpich-3.1/debug/2d.c:23
>>
>>> cat 2d.c
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> main(int argc, char **argv)  // needs 2 ranks //
>> {
>>      MPI_Comm comm_cart, dup_comm_world;
>>      int result,reorder;
>>      int dim[2], period[2], rin[2], rout[2];
>>      int i,j,k;
>>      int g1,g2;
>>
>>      MPI_Init (&argc, &argv);
>>
>>      for(i=0;i<100;i++)
>>      {
>>
>>         MPI_Comm_dup( MPI_COMM_WORLD, &dup_comm_world);
>>         dim[0]=2; dim[1]=1;
>>         reorder=0;
>>         period[0]=1; period[1]=1;
>>         rin[0]=0; rin[1]=1;
>>         MPI_Cart_create(dup_comm_world, 2, dim, period, reorder, &comm_cart);
>>         MPI_Comm_compare(dup_comm_world, comm_cart, &result );
>>         MPI_Comm_group(dup_comm_world,&g1);
>>         MPI_Comm_group(comm_cart,&g2);
>>         MPI_Group_translate_ranks(g1, 2, rin, g2, rout);
>>         MPI_Comm_free(&comm_cart);
>>         MPI_Comm_free(&dup_comm_world);
>>
>>      }
>>
>>      MPI_Finalize();
>>
>> }
>>
>> Thanks,
>>
>> Joost
>>
>>
>>
>>
>> ________________________________________
>> From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of "Antonio J. Peña" [apenya at mcs.anl.gov]
>> Sent: Monday, April 28, 2014 6:26 PM
>> To: discuss at mpich.org
>> Subject: Re: [mpich-discuss] Memory leak comparing communicators
>>
>> Hi Joost,
>>
>> Can you share the smallest fragment of code with which we can reproduce
>> this?
>>
>>      Antonio
>>
>>
>> On 04/28/2014 11:25 AM, VandeVondele Joost wrote:
>>> Hi,
>>>
>>> using mpich 3.1 I notice that my application leaks memory. Compiling it with '-O1 -g -fno-omit-frame-pointer -fsanitize=leak' using gcc 4.9 it appears the sources is comparing communicators, for example:
>>>
>>> [0] Direct leak of 38400 byte(s) in 600 object(s) allocated from:
>>> [0]     #0 0x7f84298d53a8 in __interceptor_malloc ../../../../gcc/libsanitizer/lsan/lsan_interceptors.cc:66
>>> [0]     #1 0x7f842aabaf6b in MPIR_Group_create /data/vjoost/mpich-3.1/src/mpi/group/grouputil.c:77
>>> [0]     #2 0x7f842ab88314 in MPIR_Comm_group_impl /data/vjoost/mpich-3.1/src/mpi/comm/comm_group.c:44
>>> [0]     #3 0x7f842ab840d3 in PMPI_Comm_compare /data/vjoost/mpich-3.1/src/mpi/comm/comm_compare.c:122
>>> [0]     #4 0x7f842aac8e84 in pmpi_comm_compare_ /data/vjoost/mpich-3.1/src/binding/f77/comm_comparef.c:267
>>> [0]     #5 0x1bfd447 in __message_passing_MOD_mp_comm_compare /data/vjoost/clean/cp2k/cp2k/src/common/message_passing.F:1084
>>>
>>> I'm not seeing other leaks, so is guess something particular is going on here. Any suggestions on a possible fix ?
>>>
>>> Thanks,
>>>
>>> Joost
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> --
>> Antonio J. Peña
>> Postdoctoral Appointee
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>> 9700 South Cass Avenue, Bldg. 240, Of. 3148
>> Argonne, IL 60439-4847
>> apenya at mcs.anl.gov
>> www.mcs.anl.gov/~apenya
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> --
> Antonio J. Peña
> Postdoctoral Appointee
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 9700 South Cass Avenue, Bldg. 240, Of. 3148
> Argonne, IL 60439-4847
> apenya at mcs.anl.gov
> www.mcs.anl.gov/~apenya
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



More information about the discuss mailing list