[mpich-discuss] Bug in HYDT_dbg_setup_procdesc

Pavan Balaji balaji at mcs.anl.gov
Tue Apr 30 07:54:14 CDT 2013


Gah.  Yes, good catch.  I've fixed it here:

http://git.mpich.org/mpich.git/commitdiff/a9640dc4

 -- Pavan

On 04/30/2013 04:59 AM US Central Time, Chris January wrote:
> Hello,
> 
> We (Allinea) have noticed a bug introduced in HYDT_dbg_setup_procdesc
> between 3.0.2 and 3.0.3 caused by this commit:
> 
> http://trac.mpich.org/projects/mpich/changeset/e04dd4b64ff618f2df58789265b741a8e9fab081/
> 
> When debugging a 4 process job on a 32-core machine using DDT we find
> that the 4 entries in MPIR_Proctable all have the same pid.
> 
> Here is how to reproduce the issue outside of DDT:
> 
> jbray at mic3:31053% gdb --args mpirun -np 4 wave_f.exe
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols
> from /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun...done.
> (gdb) break MPIR_Breakpoint
> Breakpoint 1 at 0x428a70: file ./tools/debugger/debugger.c, line 25.
> (gdb) r
> Starting
> program: /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun
> -np 4 wave_f.exe
> [Thread debugging using libthread_db enabled]
> Detaching after fork from child process 107606.
> 
> Breakpoint 1, MPIR_Breakpoint () at ./tools/debugger/debugger.c:25
> 25	}
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64
> zlib-1.2.3-29.el6.x86_64
> (gdb) print *MPIR_proctable at 4
> $1 = {{host_name = 0x6723f0 "mic3", executable_name = 0x6723d0
> "./wave_f.exe", pid = 107847}, {host_name = 0x6723b0 "mic3",
> executable_name = 0x672390 "./wave_f.exe", pid = 107847}, {
>     host_name = 0x672370 "mic3", executable_name = 0x672350
> "./wave_f.exe", pid = 107847}, {host_name = 0x672330 "mic3",
> executable_name = 0x672310 "./wave_f.exe", pid = 107847}}
> (gdb) 
> 
> As you can see MPIR_proctable claims each rank has the same pid, when in
> reality they do not:
> 
> -bash-4.1$ ps aux | grep 'wave_f.exe'
> jbray    107841  0.3  0.0  99896 18772 pts/2    S+   10:57   0:00 gdb
> --args mpirun -np 4 ./wave_f.exe
> jbray    107843  0.0  0.0  23144  1288 pts/2    T    10:57
> 0:00 /home/jbray/prog/mpich/mpich-3.0.3/mic_gnu/install/bin/mpirun -np
> 4 ./wave_f.exe
> jbray    107847  0.0  0.0  46488  1504 ?        Ss   10:57
> 0:00 ./wave_f.exe
> jbray    107848  0.0  0.0  29332  1472 ?        Ss   10:57
> 0:00 ./wave_f.exe
> jbray    107849  0.0  0.0  29332  1472 ?        Ss   10:57
> 0:00 ./wave_f.exe
> jbray    107850  0.0  0.0  29332  1472 ?        Ss   10:57
> 0:00 ./wave_f.exe
> cjanuary 107870  0.0  0.0 103244   864 pts/6    S+   10:57   0:00
> grep ./wave_f.exe
> 
> Regards,
> Chris January - VP Engineering - Allinea Software Ltd.
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list