[mpich-discuss] _get_addr error while running application using MPICH
Joseph Schuchart
schuchart at hlrs.de
Tue Nov 20 09:16:54 CST 2018
Zhifeng,
Another way to approach this is to start the application under gdb and
set a breakpoint to MPICH's internal abort function (MPID_Abort iirc).
Once you hit this you can walk up the stack and try to find out where
_get_addr was found to be faulty. Since you are running with a single
process starting under GDB should be straight forward:
$ gdb -ex "b MPID_Abort" -ex r ./real.exe
(If you pass arguments to real.exe you have to pass --args to gdb)
Cheers
Joseph
On 11/19/18 1:49 PM, Zhifeng Yang via discuss wrote:
> Hi Hui,
>
> I just searched the whole code. There is no MPI_T_* name in the code. I
> may tried the newer version later on. Thank you very much
>
> Zhifeng
>
>
> On Mon, Nov 19, 2018 at 12:14 PM Zhou, Hui <zhouh at anl.gov
> <mailto:zhouh at anl.gov>> wrote:
>
> Hi Zhifeng,
>
> We just had a new mpich release: mpich-3.3rc1. You may try that
> release see if you still have the same error.
>
> That aside, does your code uses MPI_T_ interfaces? You may try search
> MPI_T_ prefixes in your code base. In particular, I am interested in
> any MPI_T_ calls before MPI_Init call.
>
> --
> Hui Zhou
>
> On Mon, Nov 19, 2018 at 10:39:20AM -0500, Zhifeng Yang wrote:
> >Hi Hui,
> >Here are the outputs. I tried the following commands
> >mpirun --version
> >./cpi
> >mpirun ./cpi
> >mpirun -np 1 ./cpi
> >
> >[vy57456 at maya-usr1 em_real]$mpirun --version
> >HYDRA build details:
> > Version: 3.2.1
> > Release Date: Fri Nov 10 20:21:01
> CST 2017
> > CC: gcc
> > CXX: g++
> > F77: gfortran
> > F90: gfortran
> > Configure options:
> '--disable-option-checking'
> >'--prefix=/umbc/xfs1/zzbatmos/users/vy57456/application/gfortran/mpich-3.2.1'
> >'CC=gcc' 'CXX=g++' 'FC=gfortran' 'F77=gfortran'
> '--cache-file=/dev/null'
> >'--srcdir=.' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=-lpthread ' 'CPPFLAGS=
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpl/include
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpl/include
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/openpa/src
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/openpa/src
> >-D_REENTRANT
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpi/romio/include'
> >'MPLLIBNAME=mpl'
> > Process Manager: pmi
> > Launchers available: ssh rsh fork slurm ll
> lsf sge
> >manual persist
> > Topology libraries available: hwloc
> > Resource management kernels available: user slurm ll lsf sge pbs
> >cobalt
> > Checkpointing libraries available:
> > Demux engines available: poll select
> >
> >
> >[vy57456 at maya-usr1 examples]$./cpi
> >Process 0 of 1 is on maya-usr1
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410
> >wall clock time = 0.000066
> >
> >
> >[vy57456 at maya-usr1 examples]$mpirun ./cpi
> >Process 0 of 1 is on maya-usr1
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410
> >wall clock time = 0.000095
> >
> >[vy57456 at maya-usr1 examples]$mpirun -np 1 ./cpi
> >Process 0 of 1 is on maya-usr1
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410
> >wall clock time = 0.000093
> >
> >There is no error.
> >
> >Zhifeng
> >
> >
> >On Mon, Nov 19, 2018 at 10:33 AM Zhou, Hui <zhouh at anl.gov
> <mailto:zhouh at anl.gov>> wrote:
> >
> >> On Mon, Nov 19, 2018 at 10:14:54AM -0500, Zhifeng Yang wrote:
> >> >Thank you for helping me on this error. Actually, real.exe is a
> portion of
> >> >a very large weather model. It is very difficult to extract it or
> >> duplicate
> >> >the error in a simple fortran code, since I am not sure where
> the problem
> >> >is. From your discussion, I barely can understand them, in
> fact. Even I do
> >> >not know what is "_get_addr". Is it related to MPI?
> >>
> >> It is difficult to pin-point the problem without reproducing it.
> >>
> >> Anyway, let's start with mpirun. What is your output if you try:
> >>
> >> mpirun --version
> >>
> >> Next, what is your mpich version? If you built mpich, locate the
> `cpi`
> >> program in the examples folder and try `./cpi` and `mpirun
> ./cpi`. Do
> >> you have error?
> >>
> >> --
> >> Hui Zhou
> >>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list