<div dir="ltr"><div dir="ltr">Hi Joseph,<div><br></div><div>I have not tried gdb yet. But I found a description in the initialization module. Here it says</div><div><br></div><div><br></div><div><div>! <DESCRIPTION></div><div>! This routine USES the modules in WRF and then calls the init routines</div><div>! they provide to perform module specific initializations at the</div><div>! beginning of a run. Note, this is only once per run, not once per</div><div>! domain; domain specific initializations should be handled elsewhere,</div><div>! such as in <a href=start_domain.html>start_domain</a>.</div><div>! </div><div>! Certain framework specific module initializations in this file are</div><div>! dependent on order they are called. For example, since the quilt module</div><div>! relies on internal I/O, the init routine for internal I/O must be</div><div>! called first. In the case of DM_PARALLEL compiles, the quilt module</div><div>! calls MPI_INIT as part of setting up and dividing communicators between</div><div>! compute and I/O server tasks. Therefore, it must be called prior to</div><div>! module_dm, which will <em>also</em> try to call MPI_INIT if it sees</div><div>! that MPI has not be initialized yet (implementations of module_dm</div><div>! should in fact behave this way by first calling MPI_INITIALIZED before</div><div>! they try to call MPI_INIT). If MPI is already initialized before the</div><div>! the quilting module is called, quilting will not work.</div><div>! </div><div>! The phase argument is used to allow other superstructures like ESMF to </div><div>! place their initialization calls following the WRF initialization call </div><div>! that calls MPI_INIT(). When used with ESMF, ESMF will call wrf_init() </div><div>! which in turn will call phase 2 of this routine. Phase 1 will be called </div><div>! earlier. </div><div>!</div><div>! </DESCRIPTION></div><div><br></div><div> INTEGER, INTENT(IN) :: phase ! phase==1 means return after MPI_INIT()</div><div> ! phase==2 means resume after MPI_INIT()</div></div><div><br></div><div>It mentions something about MPI_INIT, but I can not understand its meaning. It may help you to understand.</div><div><br></div><div>Best,</div><div>Zhifeng</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Nov 20, 2018 at 10:17 AM Joseph Schuchart via discuss <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Zhifeng,<br>
<br>
Another way to approach this is to start the application under gdb and <br>
set a breakpoint to MPICH's internal abort function (MPID_Abort iirc). <br>
Once you hit this you can walk up the stack and try to find out where <br>
_get_addr was found to be faulty. Since you are running with a single <br>
process starting under GDB should be straight forward:<br>
<br>
$ gdb -ex "b MPID_Abort" -ex r ./real.exe<br>
<br>
(If you pass arguments to real.exe you have to pass --args to gdb)<br>
<br>
Cheers<br>
Joseph<br>
<br>
On 11/19/18 1:49 PM, Zhifeng Yang via discuss wrote:<br>
> Hi Hui,<br>
> <br>
> I just searched the whole code. There is no MPI_T_* name in the code. I <br>
> may tried the newer version later on. Thank you very much<br>
> <br>
> Zhifeng<br>
> <br>
> <br>
> On Mon, Nov 19, 2018 at 12:14 PM Zhou, Hui <<a href="mailto:zhouh@anl.gov" target="_blank">zhouh@anl.gov</a> <br>
> <mailto:<a href="mailto:zhouh@anl.gov" target="_blank">zhouh@anl.gov</a>>> wrote:<br>
> <br>
> Hi Zhifeng,<br>
> <br>
> We just had a new mpich release: mpich-3.3rc1. You may try that<br>
> release see if you still have the same error.<br>
> <br>
> That aside, does your code uses MPI_T_ interfaces? You may try search<br>
> MPI_T_ prefixes in your code base. In particular, I am interested in<br>
> any MPI_T_ calls before MPI_Init call.<br>
> <br>
> -- <br>
> Hui Zhou<br>
> <br>
> On Mon, Nov 19, 2018 at 10:39:20AM -0500, Zhifeng Yang wrote:<br>
> >Hi Hui,<br>
> >Here are the outputs. I tried the following commands<br>
> >mpirun --version<br>
> >./cpi<br>
> >mpirun ./cpi<br>
> >mpirun -np 1 ./cpi<br>
> ><br>
> >[vy57456@maya-usr1 em_real]$mpirun --version<br>
> >HYDRA build details:<br>
> > Version: 3.2.1<br>
> > Release Date: Fri Nov 10 20:21:01<br>
> CST 2017<br>
> > CC: gcc<br>
> > CXX: g++<br>
> > F77: gfortran<br>
> > F90: gfortran<br>
> > Configure options: <br>
> '--disable-option-checking'<br>
> >'--prefix=/umbc/xfs1/zzbatmos/users/vy57456/application/gfortran/mpich-3.2.1'<br>
> >'CC=gcc' 'CXX=g++' 'FC=gfortran' 'F77=gfortran'<br>
> '--cache-file=/dev/null'<br>
> >'--srcdir=.' 'CFLAGS= -O2' 'LDFLAGS=' 'LIBS=-lpthread ' 'CPPFLAGS=<br>
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpl/include<br>
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpl/include<br>
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/openpa/src<br>
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/openpa/src<br>
> >-D_REENTRANT<br>
> >-I/home/vy57456/zzbatmos_user/application/gfortran/source_code/mpich-3.2.1/src/mpi/romio/include'<br>
> >'MPLLIBNAME=mpl'<br>
> > Process Manager: pmi<br>
> > Launchers available: ssh rsh fork slurm ll<br>
> lsf sge<br>
> >manual persist<br>
> > Topology libraries available: hwloc<br>
> > Resource management kernels available: user slurm ll lsf sge pbs<br>
> >cobalt<br>
> > Checkpointing libraries available:<br>
> > Demux engines available: poll select<br>
> ><br>
> ><br>
> >[vy57456@maya-usr1 examples]$./cpi<br>
> >Process 0 of 1 is on maya-usr1<br>
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410<br>
> >wall clock time = 0.000066<br>
> ><br>
> ><br>
> >[vy57456@maya-usr1 examples]$mpirun ./cpi<br>
> >Process 0 of 1 is on maya-usr1<br>
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410<br>
> >wall clock time = 0.000095<br>
> ><br>
> >[vy57456@maya-usr1 examples]$mpirun -np 1 ./cpi<br>
> >Process 0 of 1 is on maya-usr1<br>
> >pi is approximately 3.1415926544231341, Error is 0.0000000008333410<br>
> >wall clock time = 0.000093<br>
> ><br>
> >There is no error.<br>
> ><br>
> >Zhifeng<br>
> ><br>
> ><br>
> >On Mon, Nov 19, 2018 at 10:33 AM Zhou, Hui <<a href="mailto:zhouh@anl.gov" target="_blank">zhouh@anl.gov</a><br>
> <mailto:<a href="mailto:zhouh@anl.gov" target="_blank">zhouh@anl.gov</a>>> wrote:<br>
> ><br>
> >> On Mon, Nov 19, 2018 at 10:14:54AM -0500, Zhifeng Yang wrote:<br>
> >> >Thank you for helping me on this error. Actually, real.exe is a<br>
> portion of<br>
> >> >a very large weather model. It is very difficult to extract it or<br>
> >> duplicate<br>
> >> >the error in a simple fortran code, since I am not sure where<br>
> the problem<br>
> >> >is. From your discussion, I barely can understand them, in<br>
> fact. Even I do<br>
> >> >not know what is "_get_addr". Is it related to MPI?<br>
> >><br>
> >> It is difficult to pin-point the problem without reproducing it.<br>
> >><br>
> >> Anyway, let's start with mpirun. What is your output if you try:<br>
> >><br>
> >> mpirun --version<br>
> >><br>
> >> Next, what is your mpich version? If you built mpich, locate the<br>
> `cpi`<br>
> >> program in the examples folder and try `./cpi` and `mpirun<br>
> ./cpi`. Do<br>
> >> you have error?<br>
> >><br>
> >> --<br>
> >> Hui Zhou<br>
> >><br>
> <br>
> <br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> <br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</blockquote></div>