[mpich-discuss] Debugging mpi program using mpich

Dave Goodell goodell at mcs.anl.gov
Tue Feb 26 10:38:52 CST 2013


You can run your application under valgrind with a debug build of MPICH and you should be able to eventually find your leaks that way (though the output might be a bit hard to read at first, since you will be tracking leaks in memory associated with the MPI object, not the object itself).

Just run your app like this:

----8<----
mpiexec -n NUMPROCS valgrind /path/to/your/app
----8<----

If the valgrind output gets too jumbled, you can separate them by passing the "--log-file" option to Valgrind, like this:

----8<----
mpiexec -n NUMPROCS valgrind --log-file='vg_out.%q{PMI_RANK}' /path/to/app
----8<----

This will deposit one file per process, suffixed by the rank in MPI_COMM_WORLD.

-Dave

On Feb 26, 2013, at 10:17 AM CST, Mathieu Dutour wrote:

> Thank you, I will try TotalView.
> Now another question:
> ---In C we can run program with valgrind and it tells you exactly where memory is lost, when you use unitialized values and so on.
> ---In fortran with ifort you can run with
> -warn interfaces,nouncalled -fpp -gen-interface -g -traceback -check uninit -check bounds -check pointers
> and you can do the same.
> 
> It would be extremely helpful to have similar tools in MPI that sacrifice speed
> and allows you to find any error during runtime.
> 
>   Mathieu
> 
> 
> On Tue, Feb 26, 2013 at 5:01 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> On Feb 26, 2013, at 9:39 AM CST, Mathieu Dutour wrote:
> 
> > I used mpich-3.0.1 with debugging options and the program ran correctly
> > but at the end returned some errors indicated later.
> > I thank mpich for finding those errors that other mpi implementation did
> > not find but I wonder if there is a way to transform this into more useful
> > debugging informations.
> 
> High-quality patches to improve the output are welcome.  We primarily view these leak-checking messages as tools for us (the core developers of MPICH), not for end-user consumption.  So we probably won't spend any time to change these messages ourselves.
> 
> >   Mathieu
> >
> > PS: The errors retuned after leaving:
> > leaked context IDs detected: mask=0x9d7380 mask[0]=0x3fffffff
> > In direct memory block for handle type GROUP, 3 handles are still allocated
> 
> […]
> 
> In case you have not yet found your bug, these messages are indicating that you are leaking MPI objects, especially communicators, groups, and datatypes.  It could be that they are leaked indirectly because you have not completed an outstanding request (via MPI_Wait or similar), as indicated by the lines with "REQUEST" in them.
> 
> -Dave
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list