[mpich-discuss] Code works with -ppn, fails without using MPICH 3.2
Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
matthew.thompson at nasa.gov
Tue Sep 5 12:05:33 CDT 2017
Ken,
I thought about that, but doesn't that mean I am stuck with srun as my
PM? I've never had great luck with srun compared to hydra (with other
MPI stacks).
I know you can't just do your suggestion because:
configure: error: The PM chosen (hydra) requires the PMI implementation
simple but slurm was selected as the PMI implementation.
I am currently trying it with --with-pm=none and will test it, though.
Matt
On 09/05/2017 11:21 AM, Kenneth Raffenetti wrote:
> It looks like you are using the Slurm launcher, but you might not have
> configured MPICH to use Slurm PMI. Try adding this to your configure line:
>
> --with-pmi=slurm --with-slurm=<path/to/slurm/install>
>
> Ken
>
> On 09/05/2017 09:34 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
> APPLICATIONS INC] wrote:
>> All,
>>
>> I've been evaluating different MPI stacks on our cluster and found
>> that MPICH 3.2 does really well on some simple little benchmarks. It
>> also runs Hello World just fine, so I decided to apply it to our
>> climate model (GEOS).
>>
>> However, the first time I did that, things went a bit nuts. Essentially:
>>
>>> (1065) $ mpirun -np 96 ./GEOSgcm.x | & tee withoutppn.log
>>> srun.slurm: cluster configuration lacks support for cpu binding
>>> Fatal error in PMPI_Comm_create: Unknown error class, error stack:
>>> PMPI_Comm_create(564).................:
>>> MPI_Comm_create(MPI_COMM_WORLD, group=0x88000000,
>>> new_comm=0x106d1740) failed
>>> PMPI_Comm_create(541).................:
>>> MPIR_Comm_create_intra(215)...........:
>>> MPIR_Get_contextid_sparse_group(500)..:
>>> MPIR_Allreduce_impl(764)..............:
>>> MPIR_Allreduce_intra(257).............:
>>> allreduce_intra_or_coll_fn(163).......:
>>> MPIR_Allreduce_intra(417).............:
>>> MPIDU_Complete_posted_with_error(1137): Process failed
>>> MPIR_Allreduce_intra(417).............:
>>> MPIDU_Complete_posted_with_error(1137): Process failed
>>> MPIR_Allreduce_intra(268).............:
>>> MPIR_Bcast_impl(1452).................:
>>> MPIR_Bcast(1476)......................:
>>> MPIR_Bcast_intra(1287)................:
>>> MPIR_Bcast_binomial(310)..............: Failure during collective
>>
>> (NOTE: The srun.slurm thing is just an error/warning we always get.
>> Doesn't matter if it's MPT, Open MPI, MVAPICH2, Intel MPI...it happens.)
>>
>> The thing is, it works just fine at (NX-by-NY) of 1x6 and 2x12, but
>> once I go to 3x18, boom, collapse. As I am on 28-core nodes, my first
>> thought was it was due to crossing nodes. But, those benchmarks I ran
>> did just fine for 192 nodes, so...hmm.
>>
>> Out of desperation, I finally thought, what if it was the fact that 28
>> doesn't divide 96 and passed in -ppn and:
>>
>>> (1068) $ mpirun -ppn 12 -np 96 ./GEOSgcm.x |& tee withppn.log
>>> srun.slurm: cluster configuration lacks support for cpu binding
>>>
>>> In MAPL_Shmem:
>>> NumCores per Node = 12
>>> NumNodes in use = 8
>>> Total PEs = 96
>>> ...
>>
>> Starts up just fine! Note that every other MPI stack (MPT, Intel MPI,
>> MVAPICH2, and Open MPI) handle the non-ppn type job just fine, but
>> it's possible that they are evenly distributing the processes
>> themselves. And the "MAPL_Shmem" lines you see are just reporting what
>> the process structure looks like. I've added some print statements
>> including this:
>>
>> if (present(CommIn)) then
>> CommCap = CommIn
>> else
>> CommCap = MPI_COMM_WORLD
>> end if
>>
>> if (.not.present(CommIn)) then
>> call mpi_init(status)
>> VERIFY_(STATUS)
>> end if
>> write (*,*) "MPI Initialized."
>>
>> So, boring, and CommIn is *not* present, so we are using
>> MPI_COMM_WORLD, and mpi_init is called as one would. Now if I run:
>>
>> mpirun -np 96 ./GEOSgcm.x | grep 'MPI Init' | wc -l
>>
>> to count the number initialized, multiple times, I get results like:
>> 40, 56, 56, 45, 68. Never consistent.
>>
>> So, I'm a bit at a loss. I freely admit I might have built MPICH3
>> incorrectly. It was essentially my first time. I configured with:
>>
>>> ./configure --prefix=$SWDEV/MPI/mpich/3.2/intel_17.0.4.196 \
>>> --disable-wrapper-rpath CC=icc CXX=icpc FC=ifort F77=ifort \
>>> --enable-fortran=all --enable-cxx | & tee
>>> configure.intel_17.0.4.196.log
>>
>> which might be too vanilla for a SLURM/Infiniband cluster, but, yet,
>> it works with -ppn. But maybe I need extra options to work at all
>> times? --with-ibverbs? --with-slurm?
>>
>> Any ideas on what's happening and what I might have done wrong?
>>
>> Thanks,
>> Matt
--
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
More information about the discuss
mailing list