[mpich-discuss] Code works with -ppn, fails without using MPICH 3.2

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Tue Sep 5 09:34:51 CDT 2017


All,

I've been evaluating different MPI stacks on our cluster and found that 
MPICH 3.2 does really well on some simple little benchmarks. It also 
runs Hello World just fine, so I decided to apply it to our climate 
model (GEOS).

However, the first time I did that, things went a bit nuts. Essentially:

> (1065) $ mpirun -np 96 ./GEOSgcm.x | & tee withoutppn.log
> srun.slurm: cluster configuration lacks support for cpu binding
> Fatal error in PMPI_Comm_create: Unknown error class, error stack:
> PMPI_Comm_create(564).................: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000000, new_comm=0x106d1740) failed
> PMPI_Comm_create(541).................: 
> MPIR_Comm_create_intra(215)...........: 
> MPIR_Get_contextid_sparse_group(500)..: 
> MPIR_Allreduce_impl(764)..............: 
> MPIR_Allreduce_intra(257).............: 
> allreduce_intra_or_coll_fn(163).......: 
> MPIR_Allreduce_intra(417).............: 
> MPIDU_Complete_posted_with_error(1137): Process failed
> MPIR_Allreduce_intra(417).............: 
> MPIDU_Complete_posted_with_error(1137): Process failed
> MPIR_Allreduce_intra(268).............: 
> MPIR_Bcast_impl(1452).................: 
> MPIR_Bcast(1476)......................: 
> MPIR_Bcast_intra(1287)................: 
> MPIR_Bcast_binomial(310)..............: Failure during collective

(NOTE: The srun.slurm thing is just an error/warning we always get. 
Doesn't matter if it's MPT, Open MPI, MVAPICH2, Intel MPI...it happens.)

The thing is, it works just fine at (NX-by-NY) of 1x6 and 2x12, but once 
I go to 3x18, boom, collapse. As I am on 28-core nodes, my first thought 
was it was due to crossing nodes. But, those benchmarks I ran did just 
fine for 192 nodes, so...hmm.

Out of desperation, I finally thought, what if it was the fact that 28 
doesn't divide 96 and passed in -ppn and:

> (1068) $ mpirun -ppn 12 -np 96 ./GEOSgcm.x |& tee withppn.log
> srun.slurm: cluster configuration lacks support for cpu binding
>  
>  In MAPL_Shmem:
>      NumCores per Node =           12
>      NumNodes in use   =            8
>      Total PEs         =           96
> ...

Starts up just fine! Note that every other MPI stack (MPT, Intel MPI, 
MVAPICH2, and Open MPI) handle the non-ppn type job just fine, but it's 
possible that they are evenly distributing the processes themselves. And 
the "MAPL_Shmem" lines you see are just reporting what the process 
structure looks like. I've added some print statements including this:

    if (present(CommIn)) then
        CommCap = CommIn
    else
        CommCap = MPI_COMM_WORLD
    end if

    if (.not.present(CommIn)) then
       call mpi_init(status)
       VERIFY_(STATUS)
    end if
    write (*,*) "MPI Initialized."

So, boring, and CommIn is *not* present, so we are using MPI_COMM_WORLD, 
and mpi_init is called as one would. Now if I run:

   mpirun -np 96 ./GEOSgcm.x | grep 'MPI Init' | wc -l

to count the number initialized, multiple times, I get results like: 
40, 56, 56, 45, 68. Never consistent.

So, I'm a bit at a loss. I freely admit I might have built MPICH3 
incorrectly. It was essentially my first time. I configured with:

>  ./configure --prefix=$SWDEV/MPI/mpich/3.2/intel_17.0.4.196 \
>     --disable-wrapper-rpath CC=icc CXX=icpc FC=ifort F77=ifort \
>      --enable-fortran=all --enable-cxx | & tee configure.intel_17.0.4.196.log

which might be too vanilla for a SLURM/Infiniband cluster, but, yet, it 
works with -ppn. But maybe I need extra options to work at all times? 
--with-ibverbs? --with-slurm?

Any ideas on what's happening and what I might have done wrong?

Thanks,
Matt
-- 
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list