[mpich-discuss] Bug in HYDT_dbg_setup_procdesc
Chris January
chris.january at allinea.com
Tue Apr 30 04:59:17 CDT 2013
Hello,
We (Allinea) have noticed a bug introduced in HYDT_dbg_setup_procdesc
between 3.0.2 and 3.0.3 caused by this commit:
http://trac.mpich.org/projects/mpich/changeset/e04dd4b64ff618f2df58789265b741a8e9fab081/
When debugging a 4 process job on a 32-core machine using DDT we find
that the 4 entries in MPIR_Proctable all have the same pid.
Here is how to reproduce the issue outside of DDT:
jbray at mic3:31053% gdb --args mpirun -np 4 wave_f.exe
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols
from /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun...done.
(gdb) break MPIR_Breakpoint
Breakpoint 1 at 0x428a70: file ./tools/debugger/debugger.c, line 25.
(gdb) r
Starting
program: /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun
-np 4 wave_f.exe
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 107606.
Breakpoint 1, MPIR_Breakpoint () at ./tools/debugger/debugger.c:25
25 }
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) print *MPIR_proctable at 4
$1 = {{host_name = 0x6723f0 "mic3", executable_name = 0x6723d0
"./wave_f.exe", pid = 107847}, {host_name = 0x6723b0 "mic3",
executable_name = 0x672390 "./wave_f.exe", pid = 107847}, {
host_name = 0x672370 "mic3", executable_name = 0x672350
"./wave_f.exe", pid = 107847}, {host_name = 0x672330 "mic3",
executable_name = 0x672310 "./wave_f.exe", pid = 107847}}
(gdb)
As you can see MPIR_proctable claims each rank has the same pid, when in
reality they do not:
-bash-4.1$ ps aux | grep 'wave_f.exe'
jbray 107841 0.3 0.0 99896 18772 pts/2 S+ 10:57 0:00 gdb
--args mpirun -np 4 ./wave_f.exe
jbray 107843 0.0 0.0 23144 1288 pts/2 T 10:57
0:00 /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun -np
4 ./wave_f.exe
jbray 107847 0.0 0.0 46488 1504 ? Ss 10:57
0:00 ./wave_f.exe
jbray 107848 0.0 0.0 29332 1472 ? Ss 10:57
0:00 ./wave_f.exe
jbray 107849 0.0 0.0 29332 1472 ? Ss 10:57
0:00 ./wave_f.exe
jbray 107850 0.0 0.0 29332 1472 ? Ss 10:57
0:00 ./wave_f.exe
cjanuary 107870 0.0 0.0 103244 864 pts/6 S+ 10:57 0:00
grep ./wave_f.exe
Regards,
Chris January - VP Engineering - Allinea Software Ltd.
More information about the discuss
mailing list