[mpich-discuss] MPICH process freezing without any error message or warning

Guo, Yanfei yguo at anl.gov
Tue Apr 11 19:36:17 CDT 2017


Hi,

MPICH 3.0.4 is a quite old version. Can you try your application with the latest MPICH (3.2) http://www.mpich.org/downloads/ ?

Yanfei Guo
Postdoctoral Researcher
MCS Division, ANL


On 4/11/17, 5:57 AM, "CEAM Meteorología" <ceamet at gmail.com> wrote:

    Hi MPICH users
    
    
    
    I am trying to run a meteorological simulation with RAMS model (http://vandenheever.atmos.colostate.edu/vdhpage/rams.php) <http://vandenheever.atmos.colostate.edu/vdhpage/rams.php%29> in a new cluster with CentOs7 in all nodes. We have other
     applications running fine in the cluster, sending processes to each node,... everything seemd to run fine.
    
    
    
    But the RAMS model freezes at its first stage on the parallel run. RAMS developers recommend using mpich2-1.4.1 as they have successfully tested. If I try to run a parallel simulation in the master node it runs fine and starts the required number of processes.
     If I try to use the other nodes the simulation freezes, do not stop with any error message; RAMS model processes appear in all used nodes and in the master node but any output is created and the usual status messages from RAMS do not appear on screen.
    
    
    
    The commnad line to run the model is
    
    [paco at Llamp RUN]$ time ../misc/mpich2-1.4.1/bin/mpirun -verbose -machinefile mpd.hosts -n 20 ./rams-6.2.03 -f RAMSIN
    
    
    
    I have also compiled and tried to run with mpich 3.0.4 but it behaves exactly the same way.
    
    
    
    Log messages with -verbose option for both mpich2-1.4.1 and mpich3.0.4 can be found at
    
    
    
    MPICH2-1.4.1: 
    https://www.dropbox.com/s/6sgkarmsi5vrdfd/RAMS-mpich2-1.4.1.log?dl=0 <https://www.dropbox.com/s/6sgkarmsi5vrdfd/RAMS-mpich2-1.4.1.log?dl=0>
    
    MPICH3.0.4: 
    https://www.dropbox.com/s/bxvl5q6dy03pgew/RAMS-mpich2-3.0.4.log?dl=0 <https://www.dropbox.com/s/bxvl5q6dy03pgew/RAMS-mpich2-3.0.4.log?dl=0>
    
    
    
    cpi example runs fine with both mpich instances
    
    
    
    Thanks in advance for your help and best regards
    
    

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list