[mpich-discuss] Code works with -ppn, fails without using MPICH 3.2
Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
matthew.thompson at nasa.gov
Tue Sep 5 10:28:01 CDT 2017
An update!
I started reading all the MPICH wiki pages I could find and thought I
should try -hosts or -f, and that *does* work:
> (1024) $ mpirun -f machinefile -np 96 ./GEOSgcm.x
> srun.slurm: cluster configuration lacks support for cpu binding
>
> In MAPL_Shmem:
> NumCores per Node varies from 12 to 28
> NumNodes in use = 4
> Total PEs = 96
>
So, I guess the answer is that MPICH 3.2 can't quite decode the SLURM
environment to figure out a machinefile, so I need to make one myself.
Would this be the best way to do this, or is there a way to
build/configure MPICH to better support this?
Next up: trying to figure out how to get Inifiniband supported as I
think I'm using TCP:
> (1108) $ mpichversion | grep Device
> MPICH Device: ch3:nemesis
On 09/05/2017 10:34 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
APPLICATIONS INC] wrote:
> All,
>
> I've been evaluating different MPI stacks on our cluster and found that
> MPICH 3.2 does really well on some simple little benchmarks. It also
> runs Hello World just fine, so I decided to apply it to our climate
> model (GEOS).
>
> However, the first time I did that, things went a bit nuts. Essentially:
>
>> (1065) $ mpirun -np 96 ./GEOSgcm.x | & tee withoutppn.log
>> srun.slurm: cluster configuration lacks support for cpu binding
>> Fatal error in PMPI_Comm_create: Unknown error class, error stack:
>> PMPI_Comm_create(564).................:
>> MPI_Comm_create(MPI_COMM_WORLD, group=0x88000000, new_comm=0x106d1740)
>> failed
>> PMPI_Comm_create(541).................:
>> MPIR_Comm_create_intra(215)...........:
>> MPIR_Get_contextid_sparse_group(500)..:
>> MPIR_Allreduce_impl(764)..............:
>> MPIR_Allreduce_intra(257).............:
>> allreduce_intra_or_coll_fn(163).......:
>> MPIR_Allreduce_intra(417).............:
>> MPIDU_Complete_posted_with_error(1137): Process failed
>> MPIR_Allreduce_intra(417).............:
>> MPIDU_Complete_posted_with_error(1137): Process failed
>> MPIR_Allreduce_intra(268).............:
>> MPIR_Bcast_impl(1452).................:
>> MPIR_Bcast(1476)......................:
>> MPIR_Bcast_intra(1287)................:
>> MPIR_Bcast_binomial(310)..............: Failure during collective
>
> (NOTE: The srun.slurm thing is just an error/warning we always get.
> Doesn't matter if it's MPT, Open MPI, MVAPICH2, Intel MPI...it happens.)
>
> The thing is, it works just fine at (NX-by-NY) of 1x6 and 2x12, but once
> I go to 3x18, boom, collapse. As I am on 28-core nodes, my first thought
> was it was due to crossing nodes. But, those benchmarks I ran did just
> fine for 192 nodes, so...hmm.
>
> Out of desperation, I finally thought, what if it was the fact that 28
> doesn't divide 96 and passed in -ppn and:
>
>> (1068) $ mpirun -ppn 12 -np 96 ./GEOSgcm.x |& tee withppn.log
>> srun.slurm: cluster configuration lacks support for cpu binding
>>
>> In MAPL_Shmem:
>> NumCores per Node = 12
>> NumNodes in use = 8
>> Total PEs = 96
>> ...
>
> Starts up just fine! Note that every other MPI stack (MPT, Intel MPI,
> MVAPICH2, and Open MPI) handle the non-ppn type job just fine, but it's
> possible that they are evenly distributing the processes themselves. And
> the "MAPL_Shmem" lines you see are just reporting what the process
> structure looks like. I've added some print statements including this:
>
> if (present(CommIn)) then
> CommCap = CommIn
> else
> CommCap = MPI_COMM_WORLD
> end if
>
> if (.not.present(CommIn)) then
> call mpi_init(status)
> VERIFY_(STATUS)
> end if
> write (*,*) "MPI Initialized."
>
> So, boring, and CommIn is *not* present, so we are using MPI_COMM_WORLD,
> and mpi_init is called as one would. Now if I run:
>
> mpirun -np 96 ./GEOSgcm.x | grep 'MPI Init' | wc -l
>
> to count the number initialized, multiple times, I get results like: 40,
> 56, 56, 45, 68. Never consistent.
>
> So, I'm a bit at a loss. I freely admit I might have built MPICH3
> incorrectly. It was essentially my first time. I configured with:
>
>> ./configure --prefix=$SWDEV/MPI/mpich/3.2/intel_17.0.4.196 \
>> --disable-wrapper-rpath CC=icc CXX=icpc FC=ifort F77=ifort \
>> --enable-fortran=all --enable-cxx | & tee
>> configure.intel_17.0.4.196.log
>
> which might be too vanilla for a SLURM/Infiniband cluster, but, yet, it
> works with -ppn. But maybe I need extra options to work at all times?
> --with-ibverbs? --with-slurm?
>
> Any ideas on what's happening and what I might have done wrong?
>
> Thanks,
> Matt
--
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list