[mpich-discuss] Totalview and Mpich3

Peter Thompson peter.thompson at roguewave.com
Thu Jun 4 09:52:13 CDT 2015


This is a summary of my reply to the person who posted the original query.  I'd 
fallen off the list when my email address changed and the old address was no 
longer forwarded.  But I was in touch and got the user running so I thought I'd 
pass on what I found.   The original post reported problems starting TotalView with

mpirun -tv

This method appears to be broken currently.   Looking at the 3.1.4 manual it 
looks like the documentation suggests other ways of starting TotalView.  It has

totalview -a mpirun -a -n 3 ./foo

but then suggests you can use an 'indirect' launch with

totalview mpirun -a -n 3 ./foo

The first method is just wrong and won't work.  The -a indicates that anything 
after the -a is an argument for the target program being debugged.   This 
example has two '-a'  with the first one before mpirun, so we don't see a 
program to debug.   I think the behavior is undefined.

The second method should work, but it does not invoke the 'indirect method'.  
Instead it is the 'classic' or 'direct' method.   It will start up TotalView on 
mpirun, and when the job starts, it will attach to the processes that are 
launched.    This method is tried and true, and it is scalable.   I prefer a 
variation of this method which is

totalview -args mpirun -np 3 ./foo.

It's the same functionality, but you don't have to remember to place the -a 
after mpirun, and it allows one to get the mpirun command set up correctly, and 
then prepend

totalview -args

to it.

The indirect method works as well, but is not scalable as it launches debug 
servers for each process rather than for each node. To start that one could 
simply type

totalview

or maybe

totalview foo

The first method will bring up the session manager, and you would choose to 
start a new parallel session.   It will show a screen where you can pick the 
parallel starter.  For MPICH 3.1, you could pick either MPICH2 or MPICH3, and 
they should work.  MPICH will not, as that goes back to the original MPICH of 
the 1.2 era.   Then pick the number of tasks.   You may have to add the 
additional arguments of -f hostfile to run across different nodes, but I don't 
think setting the number of nodes really makes a difference.

If you use 'totalview foo' the method is similar, but you need to go to the 
parallel tab to pick which parallel system you are using.

If you want some help with updating the docs, just let me know.  ;-)

PeterT

-- 
Peter Thompson | Principal Technical Support Engineer |
Rogue Wave Software, Inc.
Accelerating Great Code


| P 508-652-7734 | F 508-652-7701 |
www.roguewave.com / peter.thompson at roguewave.com





More information about the discuss mailing list