[mpich-discuss] Debugging mpi program using mpich

Mathieu Dutour mathieu.dutour at gmail.com
Wed Feb 27 03:03:55 CST 2013


Thank you very much! That will be helpful.
Still what I meant was something a little different.
I would like to have a mode in mpi where every little error is reported.
"Like process 1 has send data of length 10 but process 0 needed only 9
at line 400 and 500"
"mpi process request at line 240 has not been honored"
"process has left without mpi_finalize"
"mpi_sent uses MPI_FLOAT but corresponding mpi_recv uses MPI_DOUBLE"

Also deadly embrace could be detected by the mpi library I think.

All this kind of debugging would be extremely helpful in practice and I
would gladly sacrifice speed if that allows to avoid the long thankless
work of debugging.

  Mathieu


On Tue, Feb 26, 2013 at 5:38 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:

> You can run your application under valgrind with a debug build of MPICH
> and you should be able to eventually find your leaks that way (though the
> output might be a bit hard to read at first, since you will be tracking
> leaks in memory associated with the MPI object, not the object itself).
>
> Just run your app like this:
>
> ----8<----
> mpiexec -n NUMPROCS valgrind /path/to/your/app
> ----8<----
>
> If the valgrind output gets too jumbled, you can separate them by passing
> the "--log-file" option to Valgrind, like this:
>
> ----8<----
> mpiexec -n NUMPROCS valgrind --log-file='vg_out.%q{PMI_RANK}' /path/to/app
> ----8<----
>
> This will deposit one file per process, suffixed by the rank in
> MPI_COMM_WORLD.
>
> -Dave
>
> On Feb 26, 2013, at 10:17 AM CST, Mathieu Dutour wrote:
>
> > Thank you, I will try TotalView.
> > Now another question:
> > ---In C we can run program with valgrind and it tells you exactly where
> memory is lost, when you use unitialized values and so on.
> > ---In fortran with ifort you can run with
> > -warn interfaces,nouncalled -fpp -gen-interface -g -traceback -check
> uninit -check bounds -check pointers
> > and you can do the same.
> >
> > It would be extremely helpful to have similar tools in MPI that
> sacrifice speed
> > and allows you to find any error during runtime.
> >
> >   Mathieu
> >
> >
> > On Tue, Feb 26, 2013 at 5:01 PM, Dave Goodell <goodell at mcs.anl.gov>
> wrote:
> > On Feb 26, 2013, at 9:39 AM CST, Mathieu Dutour wrote:
> >
> > > I used mpich-3.0.1 with debugging options and the program ran correctly
> > > but at the end returned some errors indicated later.
> > > I thank mpich for finding those errors that other mpi implementation
> did
> > > not find but I wonder if there is a way to transform this into more
> useful
> > > debugging informations.
> >
> > High-quality patches to improve the output are welcome.  We primarily
> view these leak-checking messages as tools for us (the core developers of
> MPICH), not for end-user consumption.  So we probably won't spend any time
> to change these messages ourselves.
> >
> > >   Mathieu
> > >
> > > PS: The errors retuned after leaving:
> > > leaked context IDs detected: mask=0x9d7380 mask[0]=0x3fffffff
> > > In direct memory block for handle type GROUP, 3 handles are still
> allocated
> >
> > […]
> >
> > In case you have not yet found your bug, these messages are indicating
> that you are leaking MPI objects, especially communicators, groups, and
> datatypes.  It could be that they are leaked indirectly because you have
> not completed an outstanding request (via MPI_Wait or similar), as
> indicated by the lines with "REQUEST" in them.
> >
> > -Dave
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130227/ee145932/attachment.html>


More information about the discuss mailing list