[mpich-discuss] Totalview and Mpich3
Peter Thompson
peter.thompson at roguewave.com
Thu Jun 4 09:52:13 CDT 2015
This is a summary of my reply to the person who posted the original query. I'd
fallen off the list when my email address changed and the old address was no
longer forwarded. But I was in touch and got the user running so I thought I'd
pass on what I found. The original post reported problems starting TotalView with
mpirun -tv
This method appears to be broken currently. Looking at the 3.1.4 manual it
looks like the documentation suggests other ways of starting TotalView. It has
totalview -a mpirun -a -n 3 ./foo
but then suggests you can use an 'indirect' launch with
totalview mpirun -a -n 3 ./foo
The first method is just wrong and won't work. The -a indicates that anything
after the -a is an argument for the target program being debugged. This
example has two '-a' with the first one before mpirun, so we don't see a
program to debug. I think the behavior is undefined.
The second method should work, but it does not invoke the 'indirect method'.
Instead it is the 'classic' or 'direct' method. It will start up TotalView on
mpirun, and when the job starts, it will attach to the processes that are
launched. This method is tried and true, and it is scalable. I prefer a
variation of this method which is
totalview -args mpirun -np 3 ./foo.
It's the same functionality, but you don't have to remember to place the -a
after mpirun, and it allows one to get the mpirun command set up correctly, and
then prepend
totalview -args
to it.
The indirect method works as well, but is not scalable as it launches debug
servers for each process rather than for each node. To start that one could
simply type
totalview
or maybe
totalview foo
The first method will bring up the session manager, and you would choose to
start a new parallel session. It will show a screen where you can pick the
parallel starter. For MPICH 3.1, you could pick either MPICH2 or MPICH3, and
they should work. MPICH will not, as that goes back to the original MPICH of
the 1.2 era. Then pick the number of tasks. You may have to add the
additional arguments of -f hostfile to run across different nodes, but I don't
think setting the number of nodes really makes a difference.
If you use 'totalview foo' the method is similar, but you need to go to the
parallel tab to pick which parallel system you are using.
If you want some help with updating the docs, just let me know. ;-)
PeterT
--
Peter Thompson | Principal Technical Support Engineer |
Rogue Wave Software, Inc.
Accelerating Great Code
| P 508-652-7734 | F 508-652-7701 |
www.roguewave.com / peter.thompson at roguewave.com
More information about the discuss
mailing list