[mpich-discuss] MPICH process freezing without any error message or warning

CEAM Meteorología ceamet at gmail.com
Tue Apr 11 05:57:18 CDT 2017


Hi MPICH users

I am trying to run a meteorological simulation with RAMS model (
http://vandenheever.atmos.colostate.edu/vdhpage/rams.php) in a new cluster
with CentOs7 in all nodes. We have other applications running fine in the
cluster, sending processes to each node,... everything seemd to run fine.

But the RAMS model freezes at its first stage on the parallel run. RAMS
developers recommend using mpich2-1.4.1 as they have successfully tested.
If I try to run a parallel simulation in the master node it runs fine and
starts the required number of processes. If I try to use the other nodes
the simulation freezes, do not stop with any error message; RAMS model
processes appear in all used nodes and in the master node but any output is
created and the usual status messages from RAMS do not appear on screen.

The commnad line to run the model is
[paco at Llamp RUN]$ time ../misc/mpich2-1.4.1/bin/mpirun -verbose
-machinefile mpd.hosts -n 20 ./rams-6.2.03 -f RAMSIN

I have also compiled and tried to run with mpich 3.0.4 but it behaves
exactly the same way.

Log messages with -verbose option for both mpich2-1.4.1 and mpich3.0.4 can
be found at

MPICH2-1.4.1:
https://www.dropbox.com/s/6sgkarmsi5vrdfd/RAMS-mpich2-1.4.1.log?dl=0
MPICH3.0.4:
https://www.dropbox.com/s/bxvl5q6dy03pgew/RAMS-mpich2-3.0.4.log?dl=0

cpi example runs fine with both mpich instances

Thanks in advance for your help and best regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170411/18c608b1/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list