[mpich-discuss] double free error after MPI run finished

John Bray jbray at allinea.com
Tue Nov 19 03:36:37 CST 2013


I can confirm that 3.1rc2 fixes my problem

Thanks for your help.

John

On 9 November 2013 08:19, John Bray <jbray at allinea.com> wrote:
> Happy to validate any fix when its ready, not only for this test case,
> but with our full DDT test suite.
>
> John
>
> On 8 November 2013 19:44, Antonio J. Peña <apenya at mcs.anl.gov> wrote:
>>
>> Hi John,
>>
>> You seem to describe a known bug:
>>
>> https://trac.mpich.org/projects/mpich/ticket/1932
>>
>> We've addressed it and the fix is waiting for review before going to the master
>> branch. It would be nice if you could validate it once it's in master. May I
>> contact you then to see if that addresses your problem?
>>
>>   Antonio
>>
>>
>> On Friday, November 08, 2013 05:43:29 PM John Bray wrote:
>>> I've got some sample code that I'm using to test our debugger DDT. It
>>> works on openmpi and intelmpi, but with all the MPICH series from
>>> MPICH2 onwards to 3.1rc1, I get an error after MPI_FINALIZE. Our
>>> memory debugging library claims its at
>>>
>>> SendqFreePool (dbginit.c:386) trying to free previously freed pointer
>>>
>>> with a stacktrrace
>>> auditmpi
>>>   pmpi_finalize__
>>>     PMPI__finalize
>>>       MPIR_Call_finalize_callbacks
>>>         sendqFreePool (dbginit.c)
>>>
>>> Run from a raw mpirun command outside DDT
>>>
>>> *** glibc detected *** ./auditmpibad_f_debug.exe: double free or
>>> corruption (fasttop): 0x1032a5c8 ***
>>> *** glibc detected *** ./auditmpibad_f_debug.exe: double free or
>>> corruption (fasttop): 0x104da5c8 ***
>>> ======= Backtrace: =========
>>> ======= Backtrace: =========
>>> /lib/libc.so.6(+0xfe750e4)[0xf78e50e4]
>>> /home/jbray/prog/mpich/mpich-3.1rc1/rhel-6-ppc64_ibm/install/lib/libmpich.so
>>> .11(+0x1a8b1c)[0xfec8b1c] /lib/libc.so.6(+0xfe750e4)[0xf76c50e4]
>>> /home/jbray/prog/mpich/mpich-3.1rc1/rhel-6-ppc64_ibm/install/lib/libmpich.so
>>> .11(+0x1a8b1c)[0xfec8b1c] ... and so on
>>>
>>> I've configured with
>>>
>>> ./configure --prefix=$PWD/install --enable-shared --enable-fast=all
>>> --enable-debuginfo
>>>
>>> and compile with
>>>
>>> mpif90 -o auditmpibad_f_debug.exe auditmpibad.F90 -O0 -g
>>>
>>> Its works with the ALLTOALLV only called once.
>>>
>>> Is this a bug or a misunderstanding on my part. The code is
>>>
>>> program auditmpi
>>>
>>>   use mpi, only :       &
>>>     MPI_COMM_WORLD,     &
>>>     MPI_REAL,           &
>>>     MPI_STATUS_SIZE,    &
>>>     MPI_BSEND_OVERHEAD, &
>>>     MPI_SUM,            &
>>>     MPI_UNDEFINED
>>>
>>>   implicit none
>>>
>>>   integer, parameter :: repeats=2
>>>
>>>   integer :: rank, nproc,ierr,rpt
>>>   real    :: input(100000),output(100000)
>>>   integer :: status (MPI_STATUS_SIZE)
>>>   integer :: statuses (MPI_STATUS_SIZE,repeats)
>>>   integer :: request
>>>   logical :: odd
>>>   integer, allocatable :: recvcounts(:),sendcounts(:),displs(:)
>>>
>>>   call MPI_INIT(ierr)
>>>
>>>   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>>>   call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
>>>
>>>   if (rank == 0) print *,"auditmpi start"
>>>
>>>   if (mod(rank,2) == 0) then
>>>     odd=.false.
>>>   else
>>>     odd=.true.
>>>   end if
>>>
>>>   allocate (recvcounts(nproc))
>>>   recvcounts(:) = 100000/nproc
>>>
>>>   allocate (sendcounts(nproc))
>>>   sendcounts(:) = 100000/nproc
>>>
>>>   allocate (displs(nproc))
>>>   displs(:)=0 ! don't care if we overlap
>>>
>>>   if ( .NOT. odd) then
>>>     call MPI_IRECV(output,size(output),MPI_REAL,rank+1,1,MPI_COMM_WORLD,
>>> request, ierr)
>>>   end if
>>>   call MPI_BARRIER(MPI_COMM_WORLD,ierr)
>>>   if (odd) then
>>>     call
>>> MPI_IRSEND(input,size(input),MPI_REAL,rank-1,1,MPI_COMM_WORLD,request,ierr)
>>> end if
>>>   call MPI_WAIT(request,status,ierr)
>>>
>>>   do rpt=1,2
>>>     call
>>> MPI_ALLTOALLV(input,sendcounts,displs,MPI_REAL,output,recvcounts,displs,MPI
>>> _REAL,MPI_COMM_WORLD,ierr) end do
>>>
>>>   call MPI_BARRIER(MPI_COMM_WORLD,ierr)
>>>
>>>   deallocate (recvcounts)
>>>   deallocate (sendcounts)
>>>   deallocate (displs)
>>>
>>>   if (rank == 0) print *,"auditmpi finished"
>>>
>>>   call MPI_FINALIZE(ierr)
>>>
>>> end program auditmpi
>>>
>>> John
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list