[mpich-discuss] MPICH process freezing without any error message or warning

CEAM Meteorología ceamet at gmail.com
Wed Apr 12 02:31:41 CDT 2017


Hi

MPICH 3.0.4 was installed in the system because other applications have
been tested with this implementation. And specifically, RAMS model
developers recommend mpich2-1.4.1 as they have tested before and run
without any problem.
How can I debug/save logs for an mpich application? Although I'm not an
expert with MPICH and parallel computing I would like (I need) to solve
this problem.

Thanks

El mié., 12 abr. 2017 a las 2:36, Guo, Yanfei (<yguo at anl.gov>) escribió:

> Hi,
>
> MPICH 3.0.4 is a quite old version. Can you try your application with the
> latest MPICH (3.2) http://www.mpich.org/downloads/ ?
>
> Yanfei Guo
> Postdoctoral Researcher
> MCS Division, ANL
>
>
> On 4/11/17, 5:57 AM, "CEAM Meteorología" <ceamet at gmail.com> wrote:
>
>     Hi MPICH users
>
>
>
>     I am trying to run a meteorological simulation with RAMS model (
> http://vandenheever.atmos.colostate.edu/vdhpage/rams.php) <
> http://vandenheever.atmos.colostate.edu/vdhpage/rams.php%29> in a new
> cluster with CentOs7 in all nodes. We have other
>      applications running fine in the cluster, sending processes to each
> node,... everything seemd to run fine.
>
>
>
>     But the RAMS model freezes at its first stage on the parallel run.
> RAMS developers recommend using mpich2-1.4.1 as they have successfully
> tested. If I try to run a parallel simulation in the master node it runs
> fine and starts the required number of processes.
>      If I try to use the other nodes the simulation freezes, do not stop
> with any error message; RAMS model processes appear in all used nodes and
> in the master node but any output is created and the usual status messages
> from RAMS do not appear on screen.
>
>
>
>     The commnad line to run the model is
>
>     [paco at Llamp RUN]$ time ../misc/mpich2-1.4.1/bin/mpirun -verbose
> -machinefile mpd.hosts -n 20 ./rams-6.2.03 -f RAMSIN
>
>
>
>     I have also compiled and tried to run with mpich 3.0.4 but it behaves
> exactly the same way.
>
>
>
>     Log messages with -verbose option for both mpich2-1.4.1 and mpich3.0.4
> can be found at
>
>
>
>     MPICH2-1.4.1:
>     https://www.dropbox.com/s/6sgkarmsi5vrdfd/RAMS-mpich2-1.4.1.log?dl=0 <
> https://www.dropbox.com/s/6sgkarmsi5vrdfd/RAMS-mpich2-1.4.1.log?dl=0>
>
>     MPICH3.0.4:
>     https://www.dropbox.com/s/bxvl5q6dy03pgew/RAMS-mpich2-3.0.4.log?dl=0 <
> https://www.dropbox.com/s/bxvl5q6dy03pgew/RAMS-mpich2-3.0.4.log?dl=0>
>
>
>
>     cpi example runs fine with both mpich instances
>
>
>
>     Thanks in advance for your help and best regards
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170412/9c8f3db1/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list